Back to all jobs
Cerebras Systems logo

ML Research Engineer (Inference)

Cerebras Systems
Bengaluru2w ago

About the role

<div class="content-intro"><p><span data-contrast="none">Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs.&nbsp;</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;201341983&quot;:0,&quot;335559685&quot;:0,&quot;335559737&quot;:240,&quot;335559738&quot;:240,&quot;335559739&quot;:240,&quot;335559740&quot;:279}">&nbsp;</span></p> <p>Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups.&nbsp;<a href="https://openai.com/index/cerebras-partnership/">OpenAI recently announced a multi-year partnership with Cerebras</a>, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.&nbsp;</p> <p>Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.</p></div><p><strong><span data-contrast="none"><span data-ccp-parastyle="heading 3">About The Role</span></span></strong><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;335559738&quot;:281,&quot;335559739&quot;:281}">&nbsp;</span></p> <p><span data-contrast="auto">As a </span><strong><span data-contrast="auto">Research Engineer</span></strong><span data-contrast="auto">&nbsp;on the Inference ML team at&nbsp;Cerebras&nbsp;Systems, you will adapt today's most advanced language and vision models to run efficiently on our flagship&nbsp;Cerebras&nbsp;architecture.&nbsp;You'll&nbsp;work alongside ML researchers and engineers to design, prototype,&nbsp;validate, and&nbsp;optimize&nbsp;models, gaining end-to-end exposure to&nbsp;cutting-edge&nbsp;inference research on the world's fastest AI accelerator.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}">&nbsp;</span></p> <p><span data-contrast="auto">You will focus on pushing the frontier of&nbsp;</span><strong><span data-contrast="auto">speculative decoding</span></strong><span data-contrast="auto">,&nbsp;</span><strong><span data-contrast="auto">large-model pruning and compression</span></strong><span data-contrast="auto">,&nbsp;</span><strong><span data-contrast="auto">sparse attention</span></strong><span data-contrast="auto">, and&nbsp;</span><strong><span data-contrast="auto">sparsity-driven</span></strong><span data-contrast="auto">&nbsp;techniques to deliver low-latency, high-throughput inference at scale.</span><span data-ccp-props="{&quot;134233117&quot;:false,&quot;134233118&quot;:false,&quot;335559738&quot;:240,&quot;335559739&quot;:240}">&nbsp;</span></p> <p><strong>Responsibilities</strong>&nbsp;</p> <ul> <li>Implement and adapt transformer-based models (NLP and/or vision) to run on Cerebras hardware</li> <li>Assist in optimizing models for inference performance (latency, throughput)</li> <li>Run experiments, analyze results, and support model improvements</li> <li>Help bring up and validate models on the Cerebras system</li> <li>Debug and troubleshoot model or system issues with guidance from senior team members</li> <li>Support profiling and performance analysis using internal tools</li> <li>Collaborate with cross-functional teams (ML, software, hardware) on model integration</li> </ul> <p><strong>Minimum Qualifications</strong>&nbsp;</p> <ul> <li>Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field</li> <li>1–3 years of experience in software engineering or machine learning in a similar capacity (internships count)</li> <li>Experience with Python and at least one ML framework (e.g., PyTorch, Transformers, vLLM or SGLang)</li> <li>Understanding of deep learning concepts (e.g., neural networks, transformers)</li> <li>Experience with Generative AI and Machine Learning systems</li> <li>Strong programming skills in Python and/or C++</li> </ul> <p><strong>Preferred Qualifications</strong>&nbsp;</p> <ul> <li>Experience with&nbsp;speculative decoding,&nbsp;neural network pruning and compression,&nbsp;sparse attention,&nbsp;quantization,&nbsp;sparsity, post-training techniques, and inference-focused evaluations.&nbsp;</li> <li>Exposure to large language models or computer vision models</li> <li>Experience running experiments or tuning models</li> <li>Familiarity with tools like PyTorch, Hugging Face Transformers, or similar</li> <li>Basic understanding of performance concepts (e.g., latency, throughput)</li> <li>Experience working in Linux environments</li> </ul><div class="content-conclusion"><h4><strong>Why Join Cerebras</strong></h4> <p>People who are serious about software make their own hardware. At Cerebras we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection&nbsp; point in our business. Members of our team tell us there are five main reasons they joined Cerebras:</p> <ol> <li>Build a breakthrough AI platform beyond the constraints of the GPU.</li> <li>Publish and open source their cutting-edge AI research.</li> <li>Work on one of the fastest AI supercomputers in the world.</li> <li>Enjoy job stability with startup vitality.</li> <li>Our simple, non-corporate work culture that respects individual beliefs.</li> </ol> <p>Read our blog:&nbsp;<a href="https://www.cerebras.net/blog/5-reasons-to-join-cerebras" target="_blank" data-auth="NotApplicable" data-linkindex="0">Five Reasons to Join Cerebras in 2026.</a></p> <h4>Apply today and become part of the forefront of groundbreaking advancements in AI!</h4> <hr> <p><em>Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer.&nbsp;</em><em>We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. </em><em>We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.</em></p> <hr> <p><em>This website or its third-party tools process personal data. For more details, click <a href="https://www.cerebras.net/privacy/" target="_blank">here</a> to review our CCPA disclosure notice.</em></p></div>

741,000+ hidden jobs like this

Cerebras Systems and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.