Back to all jobs
R
Member of Technical Staff — Kernel / Compiler / Communication
radixark
Palo Alto3w ago
- Seniority
- Staff
About the role
<h2 data-start="382" data-end="403"><strong data-start="385" data-end="403">About the Role</strong></h2>
<p data-start="405" data-end="553">RadixArk is seeking a<span class="Apple-converted-space"> </span><strong data-start="427" data-end="492">Member of Technical Staff — Kernel / Compiler / Communication</strong><span class="Apple-converted-space"> </span>to push the limits of performance for frontier AI systems.</p>
<p data-start="555" data-end="739">You will work at the lowest layers of the stack — kernels, runtimes, compilers, and communication libraries — to unlock maximum efficiency from modern accelerators and interconnects.</p>
<p data-start="741" data-end="956">This role is critical to scaling training and inference across thousands of GPUs, where microseconds and memory bandwidth matter. Your work will directly shape the performance envelope of next-generation AI systems.</p>
<p data-start="958" data-end="1109">This is a deeply technical role for engineers who enjoy working close to hardware and solving performance problems that most engineers never encounter.</p>
<h2 data-start="1116" data-end="1135"><strong data-start="1119" data-end="1135">Requirements</strong></h2>
<ul data-start="1137" data-end="1693">
<li data-start="1137" data-end="1212">
<p data-start="1139" data-end="1212">5+ years of experience in systems, compiler, or performance engineering</p>
</li>
<li data-start="1213" data-end="1268">
<p data-start="1215" data-end="1268">Strong expertise in CUDA or accelerator programming</p>
</li>
<li data-start="1269" data-end="1332">
<p data-start="1271" data-end="1332">Deep understanding of GPU architecture and memory hierarchy</p>
</li>
<li data-start="1333" data-end="1394">
<p data-start="1335" data-end="1394">Experience writing or optimizing high-performance kernels</p>
</li>
<li data-start="1395" data-end="1459">
<p data-start="1397" data-end="1459">Strong background in compilers, runtimes, or code generation</p>
</li>
<li data-start="1460" data-end="1539">
<p data-start="1462" data-end="1539">Experience with distributed communication libraries (NCCL, MPI, RCCL, etc.)</p>
</li>
<li data-start="1540" data-end="1603">
<p data-start="1542" data-end="1603">Solid knowledge of networking and interconnect technologies</p>
</li>
<li data-start="1604" data-end="1637">
<p data-start="1606" data-end="1637">Proficiency in C++ and Python</p>
</li>
<li data-start="1638" data-end="1693">
<p data-start="1640" data-end="1693">Strong debugging and profiling skills at system level</p>
</li>
</ul>
<h3 data-start="1700" data-end="1719"><strong data-start="1704" data-end="1719">Strong Plus</strong></h3>
<ul data-start="1721" data-end="2153">
<li data-start="1721" data-end="1766">
<p data-start="1723" data-end="1766">Experience with Triton, TVM, XLA, or MLIR</p>
</li>
<li data-start="1767" data-end="1828">
<p data-start="1769" data-end="1828">Experience building compiler passes or IR transformations</p>
</li>
<li data-start="1829" data-end="1877">
<p data-start="1831" data-end="1877">Familiarity with NVLink, InfiniBand, or RDMA</p>
</li>
<li data-start="1878" data-end="1937">
<p data-start="1880" data-end="1937">Experience optimizing collective communication at scale</p>
</li>
<li data-start="1938" data-end="1991">
<p data-start="1940" data-end="1991">Background in HPC or performance-critical systems</p>
</li>
<li data-start="1992" data-end="2051">
<p data-start="1994" data-end="2051">Contributions to kernel/compiler/ML systems open source</p>
</li>
<li data-start="2052" data-end="2098">
<p data-start="2054" data-end="2098">Experience scaling workloads to 1000+ GPUs</p>
</li>
<li data-start="2099" data-end="2153">
<p data-start="2101" data-end="2153">Experience with mixed-precision or quantized kernels</p>
</li>
</ul>
<h2 data-start="2160" data-end="2183"><strong data-start="2163" data-end="2183">Responsibilities</strong></h2>
<ul data-start="2185" data-end="2761">
<li data-start="2185" data-end="2251">
<p data-start="2187" data-end="2251">Design and implement high-performance kernels for AI workloads</p>
</li>
<li data-start="2252" data-end="2307">
<p data-start="2254" data-end="2307">Optimize compiler and runtime stacks for ML systems</p>
</li>
<li data-start="2308" data-end="2370">
<p data-start="2310" data-end="2370">Improve communication efficiency across large GPU clusters</p>
</li>
<li data-start="2371" data-end="2439">
<p data-start="2373" data-end="2439">Reduce latency and increase throughput for distributed workloads</p>
</li>
<li data-start="2440" data-end="2501">
<p data-start="2442" data-end="2501">Profile and eliminate system bottlenecks across the stack</p>
</li>
<li data-start="2502" data-end="2579">
<p data-start="2504" data-end="2579">Collaborate with training and inference teams on performance optimization</p>
</li>
<li data-start="2580" data-end="2638">
<p data-start="2582" data-end="2638">Develop tooling for profiling and performance analysis</p>
</li>
<li data-start="2639" data-end="2712">
<p data-start="2641" data-end="2712">Contribute to long-term architecture for performance-critical systems</p>
</li>
<li data-start="2713" data-end="2761">
<p data-start="2715" data-end="2761">Push the limits of hardware–software co-design</p>
</li>
</ul>
<h2 data-start="2768" data-end="2789"><strong data-start="2771" data-end="2789">About RadixArk</strong></h2>
<p>RadixArk is an infrastructure-first company built by engineers who've shipped production AI systems, created SGLang (20K+ GitHub stars, the fastest open LLM serving engine), and developed Miles (our large-scale RL framework).</p>
<p>We're on a mission to democratize frontier-level AI infrastructure by building world-class open systems for inference and training.</p>
<p>Our team has optimized kernels serving billions of tokens daily, designed distributed training systems coordinating 10,000+ GPUs, and contributed to infrastructure that powers leading AI companies and research labs.</p>
<p>We're backed by well-known infrastructure investors and partner with Nvidia, Google, AWS, and frontier AI labs.</p>
<p>Join us in building infrastructure that gives real leverage back to the AI community.</p>
<h2 data-start="3345" data-end="3364"><strong data-start="3348" data-end="3364">Compensation</strong></h2>
<p data-start="3366" data-end="3536">We offer competitive compensation with meaningful equity, comprehensive benefits, and flexible work arrangements. Compensation depends on location, experience, and level.</p>
<h2 data-start="3543" data-end="3567"><strong data-start="3546" data-end="3567">Equal Opportunity</strong></h2>
<p data-start="3569" data-end="3656">RadixArk is an Equal Opportunity Employer and is proud to offer equal employment opportunity to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.</p>
Perks & benefits
- Equity Compensation
747,000+ hidden jobs like this
radixark and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites