Back to all jobs
F
Site Reliability Engineer
Felix
WorldwideRemote20h ago
- Employment
- Full-time
About the role
- Manage and optimize our infrastructure on Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE).
- Automate provisioning and configuration using Terraform, Helm, and scripting languages such as Go, Python, and Bash.
- Build, maintain, and improve monitoring and alerting systems using OpenTelemetry standards
- Participate in on-call rotations, incident response, and post-mortem analyses, ensuring rapid recovery and continuous learning from failures.
- Define and track SLOs/SLIs and error budgets to monitor service health and performance.
- Implement cloud security best practices to protect sensitive data and maintain the integrity of our systems.
- Collaborate across Engineering, Security, and Product teams to embed reliability and automation in every phase of development and deployment.
- Contribute to GKE cost optimization and resource management strategies to enhance efficiency and control operational spend.
- 4+ years of experience as a Platform Engineer.
- Strong hands-on experience with GCP and GKE.
- Proficiency in Kubernetes (architecture, deployments, networking, and troubleshooting).
- Solid programming or scripting skills in Go, Python, or Bash.
- Proficiency with Docker and Linux
- Experience with Terraform
- Experience with Helm
- Experience with GitHub Actions
- Strong understanding of monitoring and observability using Prometheus, Grafana, and logging frameworks.
- Familiarity with incident management, on-call operations, and post-mortem processes.
- Knowledge of network fundamentals (TCP/IP, DNS, Load Balancing).
- Experience with PostgreSQL or distributed databases.
- Awareness of FinOps and cloud cost management principles.
- Excellent problem-solving, communication, and collaboration skills, with a proactive mindset.
- GCP certifications, such as Professional DevOps Engineer or Cloud Architect.
- Certified Kubernetes Administrator (CKA).
- Experience in FinOps, cloud security, or regulated industries.
- Familiarity with PagerDuty or similar incident management tools.
- Background implementing SLOs/SLIs and error budgets in production environments.
- These are the applicable requisites, although equivalent competencies in any of the above will also be considered.
- Competitive salary
- Initial stock options grant
- Annual performance bonus
- Health, dental, and vision plans
- Remote work environment, although we have offices in Miami and México City and would love to work in hybrid model if you are up to it.
- Continuous learning opportunities
- Unlimited PTO
- Paid parental leave
- Empowering opportunities for growth in a dynamic entrepreneurial environment
Perks & benefits
- Vision Insurance
- Unlimited Vacation
- Paid Time Off
- Equity Compensation
483,000+ hidden jobs like this
Felix and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites