Back to all jobs
N
Senior Technical Product Manager, Observability
nscaleoperationsukltd
US3d ago
- Seniority
- Senior
About the role
<p><strong>About Nscale</strong></p>
<p>Nscale is taking on the hyperscalers by building a vertically integrated GenAI cloud platform. We own the data centres, software, and applications that power today's AI stack using sustainable technology solutions. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work. Collaboration is key, and we work together swiftly and respectfully, embracing adaptability and resilience in all we do.</p>
<p><strong>About the role</strong></p>
<p>Technical Product Managers at Nscale own the definition, delivery, and ongoing evolution of a slice of the Nscale platform, partnering with engineering, design, and go-to-market to turn customer and operational problems into shippable outcomes. As a Senior Technical Product Manager for Observability, you own the platform that gives customers and internal operators real-time visibility into their GPU fleet: the telemetry pipeline that scrapes data from physical infrastructure, the aggregation and storage layer, and the observability surfaces (logs, metrics, and traces) that enable fleet management, incident response, and alerting at scale. You partner daily with Fleet Software, Network Engineering, Data Centre Operations, and customer teams to make fleet health visible, actionable, and reliable as Nscale scales from a handful of deployments to a globally distributed fleet.</p>
<p> </p>
<div><strong>What you'll be doing</strong></div>
<ul>
<li>Own the roadmap for Nscale's observability platform: the telemetry pipeline, log and metrics aggregation, trace collection, and customer facing APIs and dashboards that surface fleet health to customers and operators.</li>
<li>Define how logs, metrics, and traces are captured from physical infrastructure, aggregated, and surfaced through the observability platform to enable customers to manage their fleet and handle incidents.</li>
<li>Own alerting strategy and optimisation: define what matters, reduce noise, and ensure the right signal reaches the right person at the right time.</li>
<li>Capture and prioritise new telemetry requirements as the fleet scales, working with engineering to extend coverage across new hardware, sites, and deployment types.</li>
<li>Shadow incident reviews and site operations to turn recurring manual effort and visibility gaps into platform capabilities.</li>
<li>Define and drive the metrics that matter: alert signal-to-noise ratio, time-to-detect, time-to-resolve, telemetry coverage, and platform reliability.</li>
<li>Mentor junior PMs and raise the bar for PRDs, reviews, and product decisions across the team.</li>
</ul>
<p> </p>
<div><strong>What you need</strong></div>
<ul>
<li>5–8 years in product management, with a track record owning significant areas in observability, infrastructure, or operations-facing products.</li>
<li>Demonstrated experience building observability stacks: you have owned a product that captures and surfaces logs, metrics, and traces at scale, and you understand the architectural and UX tradeoffs involved.</li>
<li>Hands-on experience with Prometheus, Loki, Mimir, Datadog, Grafana, or OpenTelemetry.</li>
<li>Experience with deployment tooling in a data centre or infrastructure context, including provisioning workflows, networking automation, or zero-touch deployment pipelines.</li>
<li>Experience building for operators and delivery teams (design engineers, project controllers, PMs, SREs, DC technicians) and a genuine appetite for their workflows.</li>
<li>Strong technical fluency: you can lead architecture and trade-off discussions across telemetry pipelines, time-series storage, alerting systems, and observability integrations.</li>
<li>A record of moving ambiguous operational problems to shipped outcomes that measurably improve visibility, incident response, or fleet reliability.</li>
<li>Excellent written and verbal communication across engineers, operators, and executives.</li>
</ul>
<p> </p>
<div><strong>Nice to haves</strong></div>
<ul>
<li>Broader observability problem domain experience across different toolsets beyond the above stack.</li>
<li>Familiarity with bare-metal provisioning tools (OpenStack Ironic, MAAS, or similar) or network automation tooling (NetBox, Nautobot, or similar).Degree in CS or engineering, or prior experience as an engineer, SRE, or infrastructure operator.</li>
<li>Familiarity with GPU or accelerated compute infrastructure, data centre operations, or hyperscaler-style deployment at scale.</li>
<li>ITSM: Jira Service Management, ServiceNow, Zendesk, or Freshservice.</li>
<li>Experience in high-growth environments where the product is being built alongside the fleet it monitors.</li>
</ul>
<p>Join Nscale as we build a world-class AI cloud platform. If you're excited about owning the software that turns contracts into live GPU capacity, we'd love to hear from you!</p>
<p>At Nscale, we are committed to fostering an inclusive, diverse, and equitable workplace. We believe that a variety of perspectives enriches our work environment, and we encourage applications from candidates of all backgrounds, experiences, and abilities. We strongly encourage applications from people of colour, the LGBTQ+ community, people with disabilities, neurodivergent people, parents, carers, and people from lower socio-economic backgrounds.</p>
<p>If there’s anything we can do to accommodate your specific situation, please let us know.</p>
<p>The responsibilities outlined in this job description are not exhaustive and are intended to provide a general overview of the position. The employee may be required to perform additional duties, tasks, and responsibilities as assigned by management, consistent with the skills and qualifications required for the role.</p><div class="content-pay-transparency"><div class="pay-input"><div class="description"><p>The range below reflects the base salary for the position. Actual compensation may vary based on job-related factors such as skill set, experience, education, and location. In addition to base salary, this role may be eligible for bonus, equity, and/or commission programs. Nscale may offer a competitive benefits package including medical, dental, vision, flexible paid time off, parental leave, and retirement plan participation.</p></div><div class="title">Salary Range</div><div class="pay-range"><span>$200,000</span><span class="divider">—</span><span>$280,000 USD</span></div></div></div><div class="content-conclusion"><p><em>For information on how Nscale handles candidate personal data, please see our Employee & Candidate Privacy Notice: <a href="https://drive.google.com/file/d/1QK5Yg04WHD9K9IAtJgQWubJZC9oLvatK/view?usp=sharing" target="_blank" data-saferedirecturl="https://www.google.com/url?q=https://drive.google.com/file/d/1QK5Yg04WHD9K9IAtJgQWubJZC9oLvatK/view?usp%3Dsharing&source=gmail&ust=1765375172804000&usg=AOvVaw2Ncte4rmlGl8OKuFuDgDtx">Here.</a></em></p></div>
Perks & benefits
- Paid Time Off
- Equity Compensation
731,000+ hidden jobs like this
nscaleoperationsukltd and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites