Back to all jobs
F

Site Reliability Engineer

Fabrichealth

WorldwideRemote2w ago
Employment
Full-time

About the role

  • Infrastructure & Kubernetes Orchestration
    • Designing, deploying, and maintaining production Kubernetes (EKS) clusters to ensure enterprise-grade availability for our users.
    • Eliminating manual configuration by building and managing a scalable infrastructure state entirely through Terraform.
    • Optimizing the AWS footprint—specifically EC2, RDS, and S3—to balance high performance with cost-efficiency and reliability.
  • AI-Assisted Operations & Automation
    • Exploring and deploying agentic workflows for AI-assisted runbooks that automate complex operational decisions and repetitive tasks.
    • Building and evolving deployment pipelines using GitHub Actions or Semaphore to ensure delivery is both rapid and safe.
    • Focusing on toil reduction by developing internal tools that replace manual operational work with intelligent, autonomous systems.
  • Observability & Incident Management
    • Driving the evolution of the observability stack in Datadog by implementing the sophisticated metrics, traces, and logs needed to meet SLOs.
    • Leading incident response efforts and facilitating the blameless postmortems that help systematically reduce recovery time (MTTR).
    • Defining and monitoring the SLIs and SLOs that ensure the platform consistently meets rigorous healthcare performance standards.
  • Compliance & Collaboration
    • Ensuring every piece of infrastructure remains fully compliant with HIPAA and other critical healthcare regulatory requirements.
    • Mentoring engineers across the company on reliability best practices and contributing a clinical-safety perspective to cross-functional design reviews.

  • You are a deeply proficient engineer who excels at the intersection of cloud infrastructure, automation, and system design.
  • You possess a meticulous approach to observability and a passion for finding the "root cause" rather than just applying a patch.
  • You enjoy exploring the "next frontier" of SRE, including how AI and agentic tools can make operations more efficient.
  • You thrive in fast-paced environments where technical rigor is balanced with pragmatism and clinical-grade safety.


  • You prefer working on static infrastructure rather than evolving systems through code and automation.
  • You are uncomfortable with the "agile" pace of tech-driven platform development or integrating AI tools into your daily workflow.
  • You prefer a siloed role that does not involve active participation in incident response or collaborative postmortems.


  • 5+ years of experience in SRE, DevOps, or Platform roles managing production environments at scale.
  • Expert technical depth in AWS (EKS, EC2, RDS, S3) and production-grade Kubernetes management.
  • Proficiency with modern tooling including Terraform (IaC), Datadog (Observability), and CI/CD systems.
  • Deeply proficient coding and scripting skills in Python, Bash, Ruby, or Go.
  • Preferred experience building agentic workflows or AI-assisted tooling to drive operational efficiency.
  • A "rigor-first" mindset with a dedication to HIPAA-compliant, high-availability architecture.

  • Verify the Domain: Official recruitment emails will come from addresses ending in @fabrichealth.com or @gem.com. No other domain names are legitimate.
  • Official Interview Tools: We use Gem for our recruitment process and Google Meet for all video interviews. Google Meet is always the platform used for your first interview; you will never be sent a Zoom link to set up or conduct an initial interview. All interviews are conducted via video unless specifically stated by our team as an audio call. We never conduct interviews via chat, social media, Skype, or WhatsApp.
  • Zoom Usage: Zoom is utilized only for specific meetings set directly by our team for purposes outside of the standard interview process (e.g., coordination or onboarding discussions). It is never the first link you will receive from us.
  • Authorized Contact & Texting: Fabric will only contact you if you have submitted an application or if you are connected to a current employee who shared your information with us. We will only send text messages if you have provided explicit authorization and consent, either through your application or while communicating directly with our team. If you have not explicitly authorized us to reach out, treat any SMS or unsolicited outreach as fraudulent and do not respond.
  • Sensitive Data: We will ask you for sensitive personal or financial documents (ID, banking info, SSN) during the application, interview, or candidacy stages. All sensitive data is handled through secure internal systems post-offer.
  • Verify the Team: You can reference LinkedIn to verify members of our recruiting team; however, please remain vigilant as scammers may create fraudulent profiles. Always cross-reference the sender's email domain with our official @fabrichealth.com address.

Perks & benefits

  • 401k
  • Unlimited Vacation
  • Paid Time Off
  • Equity Compensation

764,000+ hidden jobs like this

Fabrichealth and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.