Back to all jobs

- Employment
- Full-time
About the role
- Annual Wellness Bonus
- Monthly Edenred Electronic Food Voucher
- Udemy: Access for your professional development
- Flexible Holiday plan & other leave benefits
- Book Benefit: Professional development books and an additional annual budget for fiction books of your choice
- Subsidised sports card and many other benefits!
What You’ll be Doing:
- Own the end-to-end Problem Management lifecycle in line with ITIL best practice: problem detection, logging, categorisation, prioritisation, investigation, resolution, and closure
- Maintain and govern the Problem Record backlog in Jira Service Management, ensuring all records are accurate, prioritised, and progressing toward resolution
- Define and enforce the standards for problem identification, including criteria for reactive problem management (post-incident) and proactive problem management (trend analysis and risk identification)
- Manage the Known Error Database (KEDB), ensuring it is current, accurate, and actively used by L1/L2 support teams to improve first-contact resolution
- Lead and facilitate structured RCA sessions following major and recurring incidents, using recognised methodologies (e.g. 5 Whys, Fishbone/Ishikawa, fault tree analysis)
- Produce high-quality Problem Records and RCA reports that clearly articulate the root cause, contributing factors, timeline, and recommended corrective/preventative actions
- Ensure RCA outputs translate into tracked, accountable action plans with clear owners, timelines, and success criteria
- Challenge superficial root cause findings and push for systemic, durable fixes rather than symptomatic workarounds
- Analyse incident, change, and event data to proactively identify trends, recurring issues, and systemic risks before they become major incidents
- Collaborate with Observability and Platform teams to use monitoring signals, error budgets, and SLO breach data as early-warning inputs to the problem management process
- Contribute to the shift-left support agenda by feeding problem findings into runbooks, playbooks, and operability improvements
- Communicate problem status, known errors, and risk exposure clearly to technical and non-technical stakeholders, including engineering leads and senior management
- Produce regular problem management reporting, including metrics such as: number of open problems by age/severity, incident recurrence rate, time to root cause, and percentage of problems with preventative actions closed on time
- Present insights and trends to the Director of Application Operations and wider PETO leadership to inform prioritisation decisions and continuous improvement initiatives
- Work closely with Incident Management to ensure seamless handoff from major incidents into the problem management process
- Partner with L2.5/L3 engineering teams to coordinate investigation effort, agree timelines, and remove blockers to root cause resolution
- Integrate problem management activity into the Service Catalogue and Jira Service Management workflows, ensuring service ownership and escalation paths are respected
- Contribute to Change Management processes by ensuring known problems and risks are visible to change approvers, reducing the risk of change-induced incidents
- Continuously assess and improve the Problem Management process itself, maturing capability over time and aligning with evolving ITIL and organisational standards
- Build and maintain problem management documentation, templates, and guidance to enable consistent, high-quality practice across the PETO organisation
- Support the development of L2 team capability in recognising and logging potential problems, contributing to the team's progression toward greater autonomy
Experience and Skills You Need in this Role:
- Solid, demonstrable experience in an ITIL-aligned Problem Management role, ideally within a fast-paced, product-led technology organisation
- Strong working knowledge of ITIL Problem Management practices (ITIL 4 Foundation certification or above preferred), including the distinction between reactive and proactive problem management and the role of the KEDB
- Hands-on experience facilitating RCA sessions using structured methodologies (5 Whys, Fishbone, fault tree analysis, etc.) and translating findings into actionable improvement plans
- Experience working with Jira Service Management or a comparable ITSM platform to manage problem records, workflows, and reporting
- Ability to analyse incident and operational data to identify trends and systemic issues, with experience using dashboards or reporting tools to communicate findings
- Strong written and verbal communication skills, with the ability to produce clear RCA reports and updates for both technical audiences and senior non-technical stakeholders
- Collaborative working style with experience engaging engineering, infrastructure, and operations teams in problem investigation and resolution
- Familiarity with Agile ways of working and the ability to integrate ITIL practices within a modern, product-centric engineering environment
- Experience with observability and monitoring tooling (e.g. Datadog, Grafana, PagerDuty) as inputs to proactive problem management
- Understanding of SLOs, error budgets, and their relationship to operational risk and problem prioritisation
- Experience contributing to or maintaining a knowledge base (e.g. Confluence), including runbooks and known error documentation
- Exposure to cloud-native application architectures and API-first platforms
- ITIL 4 Specialist or Practitioner certification in relevant practices (e.g. Problem Management, Incident Management)
- Experience with operational metrics and reporting frameworks, including DORA metrics or similar
The Interview Process:
- Screening call with Talent Acquisition Partner
- First Stage Interview with the Director of Application Operations & the VP Platform Engineering
731,000+ hidden jobs like this
Reward Gateway and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites