Back to all jobs
Intermedia Intelligent Communications logo

Principal SRE

Intermedia Intelligent Communications
WorldwideRemote1w ago
Employment
Full-time
Seniority
Staff

About the role

What you will be doing:

  • Partner with Engineering teams to design resilient services, architectures, and deployment patterns. 
  • Define and promote SRE practices including SLIs, SLOs, error budgets, capacity planning, incident response, and post-incident learning. 
  • Identify systemic reliability risks and work with teams to address root causes. 
  • Help reduce operational toil through automation, tooling, and better engineering practices. 
  • Work actively with Engineering teams during design, development, and production-readiness reviews. 
  • Advise and challenge teams on service architecture, fault tolerance, scalability, observability, deployment safety, and operational readiness, helping them to make pragmatic trade-offs. 
  • Support teams in diagnosing complex performance, latency, throughput, and resource-utilisation issues. 
  • Help establish engineering standards and reusable patterns for reliable, maintainable services. 
  • Lead investigations into performance bottlenecks across applications, infrastructure, databases, queues, networks, and third-party dependencies. 
  • Improve observability through metrics, logs, traces, dashboards, alerting, and service-level indicators. 
  • Help teams design meaningful alerts that identify user-impacting issues while reducing noise. 
  • Drive capacity planning and load-testing practices for critical systems. 
  • Build and improve automation, deployment tooling, infrastructure-as-code, monitoring, and reliability platforms. 
  • Contribute to CI/CD improvements, release safety, rollback strategies, and progressive delivery practices. 
  • Develop tools that help Engineering teams self-serve reliability, diagnostics, and operational insights. 
  • Improve cloud, container, and orchestration environments with a focus on security, reliability, and scalability. 
  • Participate in incident response for high-priority production issues. 
  • Lead or contribute to blameless post-incident reviews. 
  • Ensure actions from incidents result in improvements to architecture, tooling, monitoring, or process. 
  • Mentor engineers on production ownership and operational best practices. 

What you will bring to the role:

  • Experience in Site Reliability Engineering or senior backend/software engineering roles. 
  • Software engineering background, with the ability to write clean, maintainable production code. 
  • Experience working with Engineering teams to influence architecture and improve production readiness. 
  • Understanding of distributed systems, scalability, resiliency patterns, failure modes, and performance engineering. 
  • Experience diagnosing complex production issues across application and infrastructure layers. 
  • Hands-on experience with cloud platforms such as AWS, Azure, or GCP. 
  • Hands-on experience with on-premise environments and virtualization. 
  • Experience with containers and orchestration technologies, Kubernetes is a must. 
  • Knowledge of observability tooling, including metrics, logging, tracing, dashboards, and alerting. 
  • Experience with infrastructure-as-code tools such as Terraform. 
  • Experience with CI/CD pipelines and safe deployment practices. 
  • Strong scripting or programming skills in languages such as Python, Go, Java, C#, JavaScript/TypeScript, or similar. 
  • Clear and structured communication skills, with the ability to explain complex technical issues clearly to engineering and leadership audiences. 

Diversity, Inclusion, and Equal Opportunity

741,000+ hidden jobs like this

Intermedia Intelligent Communications and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.