Back to all jobs
I

Member of Technical Staff, Inference & Serving

Inception

Bay Area$200k–350kOn-site3mo ago
Employment
Full-time
Seniority
Staff

About the role

  • Build and optimize high-performance model serving systems for low-latency inference of diffusion LLMs.
  • Extend orchestration frameworks (Kubernetes, Ray, SLURM) for distributed inference, evaluation, and large-batch serving.
  • Implement and manage load balancing, autoscaling, and traffic routing for model endpoints.
  • Build systems for model versioning, canary deployments, and zero-downtime rollouts.
  • Develop monitoring, alerting, and observability tooling to ensure SLA compliance and rapid incident response.
  • Collaborate with ML researchers to translate model advances (new architectures, quantization techniques, batching strategies) into production-ready serving improvements.
  • BS/MS/PhD in Computer Science, Engineering, or a related field (or equivalent experience).
  • Knowledge of ML serving frameworks (SGLang, vLLM, Triton Inference Server, TensorRT-LLM).
  • Understanding of ML frameworks (PyTorch, TensorFlow) from a systems perspective.
  • Familiarity with high-performance computing and GPU programming (CUDA).
  • Experience with containerization (Docker), orchestration (Kubernetes), and CI/CD pipelines.
  • Background in performance optimization and profiling of ML systems.
  • Experience building and maintaining large-scale language models with tens of billions of parameters or more.
  • Experience with distributed systems and cloud computing platforms (AWS/GCP/Azure).
  • Experience with ML workflow orchestration tools (Kubeflow, Airflow).
  • Experience with model optimization techniques (quantization, distillation, speculative decoding, continuous batching).
  • Knowledge of ML-specific infrastructure challenges (checkpointing, resource scheduling, etc.).

Compensation

  • Work with World-Class Talent: Collaborate with the inventors of diffusion models and leading AI researchers
  • Shape Foundational Technology: Your decisions will influence how the next generation of AI products are built and used
  • Immediate Impact: Join at the ground floor where your contributions directly shape product direction and company trajectory
  • Competitive salary and equity in a rapidly growing startup
  • Flexible vacation and paid time off (PTO)
  • Health, dental, and vision insurance
  • 401k match
  • Catered meals (breakfast, lunch, & dinner)
  • Commuter subsidies
  • A collaborative and inclusive culture

Perks & benefits

  • 401k
  • Vision Insurance
  • Unlimited Vacation
  • Paid Time Off
  • Pension Matching
  • Equity Compensation

764,000+ hidden jobs like this

Inception and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.