Back to all jobs
Kronos Research logo

Senior SRE Engineer

Kronos Research
Taiwan2w ago
Seniority
Senior

About the role

<h3>Responsibilities</h3> <p><strong>Linux Systems &amp; Automation (Core)</strong></p> <p>- Manage large-scale Linux environments: troubleshooting and root-cause analysis<br>- Write maintainable, hand-off-ready Bash / Ansible / Python automation<br>- On-call for infrastructure, CI/CD, and production service incidents</p> <p><strong>HPC Cluster &amp; Storage</strong></p> <p>- Operate HPC clusters (Slurm) along with usage analytics, auditing, and monitoring tools<br>- Maintain and plan storage for compute environments (Lustre, NAS)</p> <p><strong>Cloud &amp; Hybrid Infrastructure</strong></p> <p>- Manage multi-cloud environments (AWS, Alibaba Cloud, GCP) with Terraform / AWS CDK<br>- Build and operate Docker (ECS) / Kubernetes (EKS) environments and their deployment workflows</p> <p><strong>CI/CD &amp; Developer Experience</strong></p> <p>- Operate self-hosted GitLab server and Runner fleet<br>- Operate CI/CD systems and design deployment pipelines for research and other projects</p> <p><strong>GenAI / Internal Platform</strong></p> <p>- Build internal AI platforms (LangChain / LangGraph / Bedrock, Elasticsearch RAG)<br>- Develop MCP servers, chatbots, AI agents, and similar services</p> <h3>Requirements</h3> <p>- **5+ years** of hands-on Linux systems administration and infrastructure operations experience<br>- Solid Linux internals knowledge (process / memory / filesystem / networking / systemd / cgroup); able to localize issues even without complete logs<br>- Strong Bash / Shell scripting skills — able to write maintainable scripts that others can pick up<br>- Programming ability for data processing, CLI tools, and API services; Python proficiency preferred<br>- Solid storage fundamentals with hands-on experience: RAID levels and rebuild trade-offs, filesystem selection, snapshot and backup planning; NAS / shared storage (NFS / SMB) operations experience<br>- Experience with at least one major public cloud (AWS / GCP / Alibaba Cloud) and IaC tooling (Terraform / CDK / Ansible)<br>- Familiar with containerization and orchestration (Docker, Kubernetes)<br>- CI/CD pipeline design and operations experience (GitLab CI / Jenkins / Airflow)<br>- Able to own a cross-service subsystem end-to-end: design, implementation, documentation, handoff<br>- **Strong autonomy**: can drive a problem from discovery, root-cause investigation, decision-making, to delivery with minimal supervision; able to make judgment calls under incomplete information and proactively communicate progress, risks, and rationale<br>- **Self-directed**: doesn't wait for tickets — identifies problems worth solving and prioritizes them independently</p> <h3>Nice to Have</h3> <p>- HPC scheduler experience (Slurm / PBS / LSF)<br>- Parallel filesystem operations experience (Lustre / GPFS / BeeGFS)<br>- Advanced Linux performance analysis (perf, eBPF, ftrace) and kernel parameter tuning<br>- DB operations experience (MySQL, ClickHouse)<br>- Low-latency network tuning and cross-datacenter link optimization<br>- LLM application development (LangChain, RAG, Agent, MCP)<br>- Self-managed Kubernetes experience (Kubespray, kubeadm)<br>- GPU server operations (single-node): NVIDIA driver / CUDA toolkit version management, `nvidia-smi` / DCGM monitoring, nvidia-container-toolkit integration, troubleshooting XID / ECC errors and thermal throttling<br>- Experience or familiarity with integrating GPU resources into Slurm: GRES configuration, cgroup-based GPU isolation, user/job-level resource limits</p>

741,000+ hidden jobs like this

Kronos Research and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.