Back to all jobs
R

Member of Technical Staff — Inference

radixark

Palo Alto4d ago
Seniority
Staff

About the role

<h2 data-start="254" data-end="275"><strong data-start="257" data-end="275">About the Role</strong></h2> <p data-start="277" data-end="390">RadixArk is seeking a<span class="Apple-converted-space">&nbsp;</span><strong data-start="299" data-end="340">Member of Technical Staff — Inference</strong><span class="Apple-converted-space">&nbsp;</span>to push the limits of large-scale AI inference.</p> <p data-start="392" data-end="652">You will work on the core systems that serve frontier models at scale, optimizing performance, latency, throughput, and cost across thousands of GPUs. This role sits at the intersection of systems engineering, ML infrastructure, and performance optimization.</p> <p data-start="654" data-end="760">Your work will directly shape how state-of-the-art models are deployed and experienced by users worldwide.</p> <p data-start="762" data-end="930">This is a deeply technical, high-impact role for engineers who enjoy working close to the hardware–software boundary and solving performance-critical problems at scale.</p> <h2 data-start="937" data-end="956"><strong data-start="940" data-end="956">Requirements</strong></h2> <ul data-start="958" data-end="1592"> <li data-start="958" data-end="1067"> <p data-start="960" data-end="1067">5+ years of experience in systems engineering, ML infrastructure, or performance-critical backend systems</p> </li> <li data-start="1068" data-end="1151"> <p data-start="1070" data-end="1151">Strong expertise in large-scale inference systems for LLMs or generative models</p> </li> <li data-start="1152" data-end="1226"> <p data-start="1154" data-end="1226">Deep understanding of GPU architecture and performance characteristics</p> </li> <li data-start="1227" data-end="1304"> <p data-start="1229" data-end="1304">Experience optimizing latency- and throughput-critical production systems</p> </li> <li data-start="1305" data-end="1376"> <p data-start="1307" data-end="1376">Strong knowledge of distributed systems and networking fundamentals</p> </li> <li data-start="1377" data-end="1443"> <p data-start="1379" data-end="1443">Proficiency in Python, Rust, C++, or Go for production systems</p> </li> <li data-start="1444" data-end="1511"> <p data-start="1446" data-end="1511">Experience profiling and optimizing compute-intensive workloads</p> </li> <li data-start="1512" data-end="1592"> <p data-start="1514" data-end="1592">Strong debugging skills across system layers (model, runtime, kernel, network)</p> </li> </ul> <h3 data-start="1599" data-end="1618"><strong data-start="1603" data-end="1618">Strong Plus</strong></h3> <ul data-start="1620" data-end="2000"> <li data-start="1620" data-end="1693"> <p data-start="1622" data-end="1693">Experience with LLM serving stacks (SGLang, vLLM, TensorRT-LLM, etc.)</p> </li> <li data-start="1941" data-end="2000"> <p data-start="1943" data-end="2000">Open-source contributions in ML or systems infrastructure</p> </li> <li data-start="1694" data-end="1758"> <p data-start="1696" data-end="1758">Familiarity with CUDA, Triton, or custom kernel optimization</p> </li> <li data-start="1759" data-end="1835"> <p data-start="1761" data-end="1835">Experience with batching, KV-cache management, and scheduling strategies</p> </li> <li data-start="1836" data-end="1890"> <p data-start="1838" data-end="1890">Experience running inference at scale (1000+ GPUs)</p> </li> <li data-start="1891" data-end="1940"> <p data-start="1893" data-end="1940">Background in HPC or high-performance systems</p> </li> </ul> <h2 data-start="2007" data-end="2030"><strong data-start="2010" data-end="2030">Responsibilities</strong></h2> <ul data-start="2032" data-end="2652"> <li data-start="2032" data-end="2105"> <p data-start="2034" data-end="2105">Design and build large-scale inference systems for frontier AI models</p> </li> <li data-start="2106" data-end="2183"> <p data-start="2108" data-end="2183">Optimize latency, throughput, and GPU utilization in production inference</p> </li> <li data-start="2184" data-end="2248"> <p data-start="2186" data-end="2248">Develop and improve model serving architectures and runtimes</p> </li> <li data-start="2249" data-end="2315"> <p data-start="2251" data-end="2315">Work on batching, scheduling, and memory management strategies</p> </li> <li data-start="2316" data-end="2400"> <p data-start="2318" data-end="2400">Collaborate with kernel, compiler, and systems teams on performance optimization</p> </li> <li data-start="2401" data-end="2451"> <p data-start="2403" data-end="2451">Debug performance bottlenecks across the stack</p> </li> <li data-start="2452" data-end="2517"> <p data-start="2454" data-end="2517">Drive reliability and scalability of inference infrastructure</p> </li> <li data-start="2518" data-end="2590"> <p data-start="2520" data-end="2590">Build tooling for observability, profiling, and performance analysis</p> </li> <li data-start="2591" data-end="2652"> <p data-start="2593" data-end="2652">Contribute to long-term inference architecture and strategy</p> </li> </ul> <h2 data-start="2659" data-end="2680"><strong data-start="2662" data-end="2680">About RadixArk</strong></h2> <p>RadixArk is an infrastructure-first company built by engineers who've shipped production AI systems, created SGLang (20K+ GitHub stars, the fastest open LLM serving engine), and developed Miles (our large-scale RL framework).</p> <p>We're on a mission to democratize frontier-level AI infrastructure by building world-class open systems for inference and training.</p> <p>Our team has optimized kernels serving billions of tokens daily, designed distributed training systems coordinating 10,000+ GPUs, and contributed to infrastructure that powers leading AI companies and research labs.</p> <p>We're backed by well-known infrastructure investors and partner with Nvidia, Google, AWS, and frontier AI labs.</p> <p>Join us in building infrastructure that gives real leverage back to the AI community.</p> <h2 data-start="3402" data-end="3421"><strong data-start="3405" data-end="3421">Compensation</strong></h2> <p data-start="3423" data-end="3593">We offer competitive compensation with meaningful equity, comprehensive benefits, and flexible work arrangements. Compensation depends on location, experience, and level.</p> <h2 data-start="3600" data-end="3624"><strong data-start="3603" data-end="3624">Equal Opportunity</strong></h2> <p data-start="3626" data-end="3713">RadixArk is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.</p>

Perks & benefits

  • Equity Compensation

755,000+ hidden jobs like this

radixark and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.