Member of Technical Staff - RL Algorithms

vmax

San Francisco4w ago

Seniority: Staff

About the role

<h2><strong>About <em>V<sub>max</sub></em></strong></h2> <p><em>V<sub>max</sub></em> is an applied research lab developing AI capable of open-ended learning. We are building systems to exceed humans in all capacities by optimising beyond the local maxima of learning from human expertise.</p> <h2>About the role</h2> <p>RL has become the de-facto method of post-training LLMs. We are limited by the sample efficiency of the current policy gradient algorithms in use today, and are looking for a talented researcher to weave together pre-LLM and post-LLM approaches to learning from experience.</p> <h2 data-section-id="r8dte7" data-start="877" data-end="896">Responsibilities</h2> <ul data-start="898" data-end="2324"> <li data-section-id="fqv0il" data-start="898" data-end="1037">Develop new RL algorithms for post-training language models.</li> <li data-section-id="1ozi2z3" data-start="1211" data-end="1410">Adapt ideas from pre-LLM reinforcement learning, such as model-based RL, temporal abstraction, and value-based learning, to modern LLM and agentic settings.</li> <li data-section-id="1avbham" data-start="1706" data-end="1858">Establish empirical baselines and evaluation protocols for measuring sample efficiency, robustness, generalization, and reward exploitation in LLM RL.</li> <li data-section-id="t9axsv" data-start="1859" data-end="2010">Analyze failure modes of RL-trained models, including reward hacking, mode collapse, over-optimization, exploration failures, and distribution shift.</li> <li data-section-id="1f2sf2y" data-start="2011" data-end="2185">Collaborate with researchers working on environments, evals, interpretability, reward modeling, and infrastructure to turn algorithmic ideas into reliable training systems.</li> <li data-section-id="soyec5" data-start="2186" data-end="2324">Own and develop a research agenda within Vmax, from identifying promising directions to executing experiments and communicating results.</li> </ul> <h2 data-section-id="w1j6vz" data-start="1480" data-end="1503">Minimum Requirements</h2> <ul data-start="1505" data-end="2409"> <li data-section-id="jzpo23" data-start="1505" data-end="1638">PhD or equivalent experience in machine learning, reinforcement learning, or a closely related field.</li> <li data-section-id="133n9jr" data-start="1639" data-end="1795">Track record of research excellence, as demonstrated by publications, open source work, deployed AI systems, or other substantial technical contributions.</li> <li data-section-id="9085d6" data-start="1796" data-end="1931">Deep understanding of modern machine learning, especially reinforcement learning, representation learning, and large language models.</li> <li data-section-id="o0dcx3" data-start="1932" data-end="2066">Strong familiarity with LLM post-training methods.</li> <li data-section-id="1oimshc" data-start="2067" data-end="2199">Experience designing and running rigorous ML experiments, including ablations, baselines, evaluation design, and failure analysis.</li> <li>Experience with large-scale ML infrastructure, distributed training, experiment tracking, data pipelines, and debugging unstable training runs.</li> <li data-section-id="flbc6g" data-start="2200" data-end="2283">Expertise with Python and at least one major ML framework such as PyTorch or JAX.</li> <li data-section-id="fpdhga" data-start="2284" data-end="2409">Ability to work independently on open-ended research problems and turn ambiguous ideas into concrete experimental programs.</li> </ul> <h2 data-section-id="17hey2t" data-start="3495" data-end="3510">Nice to have</h2> <ul data-start="3512" data-end="4720"> <li data-section-id="1o2i8j2" data-start="3512" data-end="3663">Experience developing new RL algorithms or improving existing ones in domains such as robotics, games, simulated control, language models, or agents.</li> <li data-section-id="1o2i8j2" data-start="3512" data-end="3663">Experience with LLM pre-training.</li> <li data-section-id="qc63t5" data-start="4135" data-end="4263">Strong understanding of reward modeling, verifiers, process supervision, outcome supervision, or automated evaluation systems.</li> <li data-section-id="qc63t5" data-start="4135" data-end="4263">Demonstrated software engineering ability</li> <li data-section-id="xjxuh2" data-start="4545" data-end="4720">Strong communication skills, especially the ability to explain algorithmic ideas, empirical results, and research implications to both technical and non-technical audiences</li> </ul> <h2><strong>Role specific location policy</strong></h2> <ul> <li>This role is based in our San Francisco office; for exceptional candidates we are willing to consider a hybrid arrangement</li> </ul> <h2>Compensation</h2> <p>The expected salary range for this position is $300,000 - $500,000 USD</p>

755,000+ hidden jobs like this

vmax and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime