Back to all jobs

About the role
<p></p>
<p>Job Description: Manager – Monitoring Operations</p>
<p>Role Summary</p>
<p>The Manager – Monitoring Operations will lead and manage the enterprise monitoring operations team responsible for the availability, performance, and reliability of IT infrastructure and applications. This role will oversee the day-to-day operations of BMC Helix On-Premises Monitoring tool deployed on RedHat OCP (OpenShift Container Platform), Network and Device monitoring using ParkPlace Entuity, along with OS Monitoring using Prometheus-Grafana, ensuring a high service quality, operational excellence, and continuous improvement.</p>
<p>The role requires strong people management skills, deep technical expertise in systems monitoring platforms, and experience operating monitoring solutions in containerized environments.</p>
<p>Key Responsibilities</p>
<p>· Lead, mentor, and manage a team of monitoring engineers/analysts, defining goals, KPIs, shift coverage, and on-call rotations.</p>
<p>· Drive skill development through performance reviews, training initiatives, and continuous learning plans.</p>
<p>· Act as escalation point for major monitoring incidents and outages, guiding quick workarounds to prevent monitoring gaps and loss of metrics.</p>
<p>· Ensure operational excellence aligned with ITIL practices (Incident, Problem, Change) and adherence to security, compliance, and operational standards.</p>
<p>· Manage upgrades, patches, capacity planning, and health checks across the monitoring estate to maintain high availability and performance.</p>
<p>· Oversee the Server (Windows/Linux/AIX), Network, Database & Synthetic URL Monitoring for the Enterprise and for the Global clients’ private cloud.</p>
<p>· Collaborate with Container Platform, Core Infrastructure, and Network teams on platform stability, scaling, resilience, and resource allocation.</p>
<p>· Optimize alert quality, reduce alert fatigue, standardize dashboards/alerting frameworks, and deliver actionable insights.</p>
<p>· Maintain SOPs, runbooks, and operational documentation; provide regular reports on platform health, incidents, and SLA compliance.</p>
<p>· Serve as the primary stakeholder contact for all monitoring services.</p>
<p>· Conduct annual disaster-recovery (DR) tests for the monitoring estate to validate resilience, recovery procedures, and business continuity readiness.</p>
<p> </p>
<p> </p>
<p> </p>
<p>Required Experience & Qualifications</p>
<p>Experience</p>
<p>· 10+ years of overall IT industry experience, including 5+ years in monitoring operations in medium-to-large organizations.</p>
<p>· Hands-on operational expertise with at least two of the following monitoring platforms/tools:</p>
<p>o BMC Helix Monitoring (SaaS or On-Prem)</p>
<p>o RedHat OpenShift Container Platform (OCP) or Kubernetes Cluster Management</p>
<p>o Prometheus, Exporters, OTEL Collectors, and Grafana</p>
<p>o ParkPlace Entuity Network and Hardware Monitoring</p>
<p>· Proven experience in monitoring architecture design, capacity planning, performance tuning, and integration with ITSM tools for automated ticketing workflows.</p>
<p>· Strong knowledge of ITIL processes and operational best practices.</p>
<p>Leadership & Soft Skills</p>
<p>· Strong people-management and leadership capabilities</p>
<p>· Excellent communication and stakeholder-management skills</p>
<p>· Ability to handle high-pressure situations and lead incident response</p>
<p>· Strategic mindset with a focus on operational maturity and optimization</p>
<p>Education & Certifications</p>
<p>· Bachelor’s degree in computer science, Information Technology, or equivalent</p>
<p>· Relevant certifications (preferred, not mandatory):</p>
<p>o RedHat OpenShift / Kubernetes</p>
<p>o BMC Helix</p>
<p>o Foundation certifications in ITIL and/or AI</p>
<p>Nice-to-Have</p>
<p>· Exposure to hybrid or multi-cloud environments</p>
<p>· Experience in Automation, Scripting, APIs and AI-driven service improvements</p>
<p>· Application Performance Monitoring (APM) experience</p>
<p></p>
759,000+ hidden jobs like this
Ensono and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites