Back to all jobs
L

Staff AI Infrastructure Engineer

Lumalabs Ai

SF Bay Area$230k–360kHybrid3mo ago
Employment
Full-time
Seniority
Staff

About the role

About Luma AI

Where You Come In

  • Kernels
  • Containers
  • Schedulers
  • Networking
  • Storage
  • GPU behavior

What You’ll Own

Reliability of the Frontier

  • Architect and operate large, heterogeneous GPU environments under extreme demand
  • Improve utilization and performance where small gains materially change company outcomes
  • Resolve failures that span hardware, OS, runtimes, and orchestration
  • Eliminate entire classes of instability
  • Build mechanisms that make heroics unnecessary

Scaling Training & Inference

  • Define how infrastructure and workloads evolve as cluster size and concurrency grow
  • Design scheduling, placement, and resource management approaches for increasingly complex jobs
  • Work directly with research to build the systems required for new model capabilities
  • Ensure inference platforms scale rapidly without sacrificing reliability or latency
  • Anticipate where today’s abstractions will fail and redesign ahead of them

Building the Organization

  • Hire and develop exceptional systems and reliability engineers
  • Set the bar for technical depth, judgment, and production ownership
  • Shape architecture early through strong partnerships with research and product
  • Translate reliability constraints into long-term platform strategy

Who You Are

Required:

  • Deep expertise in Linux and distributed systems
  • Experience operating GPU / accelerator clusters in real production environments
  • Strong fluency in Kubernetes and modern open-source infrastructure
  • Comfortable debugging across hardware → kernel → runtime → orchestration
  • You understand how systems behave under contention and at scale
  • You write code and build automation
  • You think in bottlenecks, failure modes, and tradeoffs
  • Engineers trust your judgment, especially when things break


Leadership Expectations

  • You raise reliability standards across the company
  • You influence product and research architecture early
  • You build strong partnerships, not ticket queues
  • You attract and level up exceptional engineers
  • You are curious how models use infrastructure, because improving systems expands what becomes possible

Why This Role Is Special

  • How research progresses
  • How products scale
  • How customers trust us
  • And how the engineering organization grows

Compensation

About Luma

764,000+ hidden jobs like this

Lumalabs Ai and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.