Back to all jobs
I
Member of Technical Staff, Inference & Serving
Inception
Bay Area$200k–350kOn-site3mo ago
- Employment
- Full-time
- Seniority
- Staff
About the role
- Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs.
- Extend orchestration frameworks (Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.
- Implement and manage load balancing, autoscaling, and traffic routing for model endpoints.
- Build systems for model versioning, canary deployments, and zero-downtime rollouts.
- Develop monitoring, alerting, and observability tooling to ensure SLA compliance and rapid incident response.
- Collaborate with ML researchers to translate model advances (new architectures, quantization techniques, batching strategies) into production-ready serving improvements.
- BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience).
- Knowledge of ML serving frameworks (SGLang, vLLM, Triton Inference Server, TensorRT-LLM).
- Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective.
- Familiarity with high-performance computing and GPU programming (CUDA).
- Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines.
- Background in performance optimization and profiling of ML systems.
- Experience building and maintaining large-scale language models with tens of billions of parameters or more.
- Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure).
- Experience with ML workflow orchestration tools (Kubeflow, Airflow).
- Experience with model optimization techniques (quantization, distillation, speculative decoding, continuous batching).
- Knowledge of ML-specific infrastructure challenges (checkpointing, resource scheduling, etc.).
Compensation
- Work with World-Class Talent: Collaborate with the inventors of diffusion models and leading AI researchers
- Shape Foundational Technology: Your decisions will influence how the next generation of AI products are built and used
- Immediate Impact: Join at the ground floor where your contributions directly shape product direction and company trajectory
- Competitive salary and equity in a rapidly growing startup
- Flexible vacation and paid time off (PTO)
- Health, dental, and vision insurance
- 401k match
- Catered meals (breakfast, lunch, & dinner)
- Commuter subsidies
- A collaborative and inclusive culture
Perks & benefits
- 401k
- Vision Insurance
- Unlimited Vacation
- Paid Time Off
- Pension Matching
- Equity Compensation
764,000+ hidden jobs like this
Inception and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites