【SRE】Site Reliability Engineer
funnow
- Employment
- Full-time
About the role
【Capsule】
At FunNow, we’re building joyful experiences, at the speed of now. As a Site Reliability Engineer, you’ll play a crucial role in ensuring our platform stays fast, resilient, and secure for millions of users booking spontaneous fun across Asia. But here’s the twist: we don’t just monitor uptime — we build with AI and automation. From Kubernetes tuning to auto-healing infrastructure, CI/CD pipelines to incident response, you'll be hands-on in evolving our DevOps culture. If you love scalable systems, believe in developer efficiency, and treat infrastructure as code, welcome aboard.
【Typical Accountability】
- Design robust architectures to comprehensively improve system availability, scalability, and service quality
- Ensure stable service operation, monitor core service status, and quickly troubleshoot issues
- Conduct in-depth analysis of system performance bottlenecks and propose and implement improvement solutions
- Maintain and optimize Kubernetes clusters (EKS/GKE), effectively handling resource pressure, node anomalies, and other situations
- Maintain and improve CI/CD pipelines and automated deployment systems (GitHub Actions / ArgoCD) to significantly enhance engineering team development efficiency
- Establish and continuously optimize system monitoring and alerting mechanisms (Prometheus / Grafana / Alertmanager)
- Assist with incident response and problem investigation
- Regularly participate in system inspections and audits, proactively proposing and implementing improvements
- Assist in maintaining and implementing fundamental security settings (e.g., IAM, resource permissions, encrypted storage)
- Actively share your experience to collectively enhance the team's engineering culture
【Essential Competencies】
- Familiarity with container technologies such as Docker or Kubernetes, and practical experience with Kubernetes operations (deployment, scheduling, resource management)
- Familiarity with AWS services (e.g., ECS, EKS, S3, CloudFront, IAM, VPC, etc.), and practical experience maintaining AWS or GCP (we primarily use AWS)
- Familiarity with at least one CI/CD tool (e.g., GitHub Actions, GitLab CI)
- Proficiency in MySQL daily management and performance analysis
- Familiarity with service-related log analysis and monitoring tools (e.g., CloudWatch, ELK/EFK, Grafana), and practical experience with Prometheus/Grafana
- Experience maintaining Elasticsearch clusters
- Familiarity with Git and basic Git flow operations
- High degree of self-management, proactive and responsible work attitude, meticulousness, and excellent communication and teamwork skills
【Desirable Competencies】
- Exposure to or familiarity with the Golang ecosystem
- Familiarity with Infra-as-Code tools such as CDK, Terraform
- Experience with IPO advisory or ISO audit
- Security awareness
【Who You Are】
- You enjoy solving real-world problems, are proactive in investigation, and act quickly
- You value stability and data accuracy, and possess a high sense of responsibility
- You are passionate about learning new tools and enjoy sharing improvement methods
- You maintain clear communication and good documentation habits in team collaboration
755,000+ hidden jobs like this
funnow and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites