Back to all jobs
Cerebras Systems logo

Engineering Manager, Inference ML Runtime

Cerebras Systems
Sunnyvale CA2w ago

About the role

<div class="content-intro"><p><span data-contrast="none">Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:0,&quot;335559737&quot;:240,&quot;335559738&quot;:240,&quot;335559739&quot;:240,&quot;335559740&quot;:279}">&nbsp;</span></p> <p>Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups.&nbsp;<a href="https://openai.com/index/cerebras-partnership/">OpenAI recently announced a multi-year partnership with Cerebras</a>, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.&nbsp;</p> <p>Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.</p></div><h3><strong>About the Role</strong></h3> <p>The <strong>Inference ML Engineering team</strong> at Cerebras builds the runtime, APIs, and systems that power the fastest generative AI inference platform in the world.</p> <p>As an <strong>Engineering Manager, Inference ML Runtime</strong>, you will lead a team responsible for designing and scaling the systems that enable seamless execution of state-of-the-art AI models on Cerebras hardware. You will operate at the intersection of <strong>machine learning, distributed systems, and high-performance runtime engineering</strong>, translating cutting-edge research into production-ready infrastructure to serve a variety of text-only and multimodal models.</p> <p>This role combines <strong>technical leadership, people management, and execution ownership</strong>, with direct impact on Cerebras’ core inference platform.</p> <h3><strong>What You’ll Do</strong></h3> <p><strong>Technical Leadership</strong></p> <ul> <li>Own the architecture and evolution of the <strong>ML inference runtime and serving systems.</strong></li> <li>Guide the design of:</li> <ul> <li>high-throughput, low-latency inference pipelines;</li> <li>multimodal model execution (text, image, audio, video);</li> <li>scalable serving infrastructure for concurrent workloads.</li> </ul> <li>Partner with cloud, compiler, core runtime, hardware, and ML teams to <strong>optimize end-to-end performance.</strong></li> </ul> <p><strong>Team Leadership</strong></p> <ul> <li>Build, manage, and grow a team of <strong>ML systems and infrastructure engineers.</strong></li> <li>Provide technical direction, mentorship, and career development.</li> <li>Foster a culture of <strong>ownership, velocity, and engineering excellence.</strong></li> <li>Recruit top talent in <strong>ML systems, distributed systems, and runtime engineering.</strong></li> </ul> <p><strong>Execution &amp; Delivery</strong></p> <ul> <li>Drive execution of complex, cross-functional initiatives across:</li> <ul> <li>ML engineering;</li> <li>compiler/runtime teams;</li> <li>cloud and infrastructure teams.</li> </ul> <li>Own delivery of features such as:</li> <ul> <li>advanced inference capabilities (structured outputs, sampling strategies);</li> <li>heterogeneous model types, including test and multimodal;</li> <li>performance optimization (latency, throughput, memory efficiency);</li> <li>observability and reliability across the inference stack.</li> </ul> <li>Ensure high-quality releases through strong testing, validation, and operational rigor.</li> </ul> <p><strong>Platform &amp; Performance Ownership</strong></p> <ul> <li>Scale Cerebras’ inference platform to handle <strong>large volumes of concurrent requests </strong>at very fast speed</li> <li>Drive improvements in:</li> <ul> <li>latency;</li> <li>throughput;</li> <li>compute efficiency.</li> </ul> <li>Identify and prioritize <strong>technical debt and system bottlenecks.</strong></li> <li>Maintain Cerebras’ <strong>industry-leading inference speed advantage.</strong></li> </ul> <p><strong>Cross-Functional Collaboration</strong></p> <ul> <li>Partner with:</li> <ul> <li>ML researchers (model enablement);</li> <li>compiler teams (model execution optimization);</li> <li>cloud/platform teams (deployment and scaling).</li> </ul> <li>Act as a bridge between <strong>research, infrastructure, and production systems.</strong></li> </ul> <h3><strong>What You Bring</strong></h3> <p><strong>Required</strong></p> <ul> <li>8+ years of experience in:</li> <ul> <li>large-scale software engineering;</li> <li>ML systems or distributed systems.</li> </ul> <li>2+ years of engineering management experience.</li> <li>Strong programming skills in:</li> <ul> <li><strong>Python</strong> (production systems);</li> <li><strong>C++</strong> (performance-critical systems).</li> </ul> <li>Experience building and scaling <strong>large-scale inference systems</strong> (LLMs or multimodal).</li> <li>Experience working with cloud infrastructures and following best-practices for building scalable microservices and applications.</li> </ul> <p><strong>Preferred</strong></p> <ul> <li>Experience with:</li> <ul> <li>LLM serving frameworks (e.g., vLLM, TensorRT-LLM, SGLang);</li> <li>PyTorch and deep learning frameworks;</li> <li>distributed systems and high-performance computing.</li> </ul> <li>Familiarity with:</li> <ul> <li>ML runtime systems;</li> <li>model execution pipelines;</li> <li>performance optimization for AI workloads.</li> </ul> </ul> <p><strong>Why This Role Matters</strong></p> <p>This team is central to Cerebras’ mission of delivering <strong>the fastest AI inference in the world</strong>. Your work will directly enable real-time AI applications and unlock new capabilities across enterprise and frontier AI use cases.</p><div class="content-conclusion"><h4><strong>Why Join Cerebras</strong></h4> <p>People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection&nbsp; point in our business. Members of our team tell us there are five main reasons they joined Cerebras:</p> <ol> <li>Build a breakthrough AI platform beyond the constraints of the GPU.</li> <li>Publish and open source their cutting-edge AI research.</li> <li>Work on one of the fastest AI supercomputers in the world.</li> <li>Enjoy job stability with startup vitality.</li> <li>Our simple, non-corporate work culture that respects individual beliefs.</li> </ol> <p>Read our blog:&nbsp;<a href="https://www.cerebras.net/blog/5-reasons-to-join-cerebras" target="_blank" data-auth="NotApplicable" data-linkindex="0">Five Reasons to Join Cerebras in 2026.</a></p> <h4>Apply today and become part of the forefront of groundbreaking advancements in AI!</h4> <hr> <p><em>Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer.&nbsp;</em><em>We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. </em><em>We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.</em></p> <hr> <p><em>This website or its third-party tools process personal data. For more details, click <a href="https://www.cerebras.net/privacy/" target="_blank">here</a> to review our CCPA disclosure notice.</em></p></div>

731,000+ hidden jobs like this

Cerebras Systems and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.