Member of Technical Staff - Mechanistic Interpretability

vmax

San Francisco4w ago

Seniority: Staff

About the role

<h2><strong>About <em>V<sub>max</sub></em></strong></h2> <p><em>V<sub>max</sub></em> is an applied research lab developing AI capable of open-ended learning. We are building systems to exceed humans in all capacities by optimising beyond the local maxima of learning from human expertise.</p> <h2>About the role</h2> <p>LLMs are fantastically powerful and there is a rapidly growing corpus of work devoted to understanding their internal representations and computations. We use the tools of mechanistic interpretability to enhance reinforcement learning by generating intrinsic rewards as a supplement or alternative to downstream human-generated verifiers. </p> <h2 data-section-id="r8dte7" data-start="3354" data-end="3373">Responsibilities</h2> <ul data-start="3375" data-end="4471"> <li data-section-id="f3yyqu" data-start="3375" data-end="3522">Develop methods for using mechanistic interpretability to extract useful training signals from the internal states of language models.</li> <li data-section-id="1s4luhz" data-start="3523" data-end="3644">Turn representations, features, circuits, and causal model behaviors into intrinsic rewards for reinforcement learning.</li> <li data-section-id="e3grab" data-start="3645" data-end="3777">Compare interpretability-derived rewards against human feedback, learned reward models, verifiers, and task-level outcome rewards.</li> <li data-section-id="1uwo5im" data-start="3905" data-end="4074">Design metrics and baselines for reward quality, including alignment with intended behavior, generalization across tasks, robustness, and resistance to reward hacking.</li> <li data-section-id="1i0vdyl" data-start="4075" data-end="4208">Investigate how internal representations evolve during RL and post-training, and use these insights to improve training objectives.</li> <li data-section-id="151c5t8" data-start="4209" data-end="4335">Develop infrastructure for reproducible, large-scale experiments on LLM agents, interpretability tools, and RL environments.</li> <li data-section-id="1vrwgaa" data-start="4336" data-end="4471">Define and pursue a high-impact research agenda that advances Vmax’s goal of open-ended learning beyond imitation of human expertise.</li> </ul> <h2 data-section-id="w1j6vz" data-start="1480" data-end="1503">Minimum Requirements</h2> <ul data-start="1505" data-end="2409"> <li data-section-id="jzpo23" data-start="1505" data-end="1638">PhD or equivalent experience in machine learning, reinforcement learning, or a closely related field.</li> <li data-section-id="133n9jr" data-start="1639" data-end="1795">Track record of research excellence, as demonstrated by publications, open source work, deployed AI systems, or other substantial technical contributions.</li> <li data-section-id="9085d6" data-start="1796" data-end="1931">Deep understanding of modern machine learning, especially reinforcement learning, representation learning, and large language models.</li> <li data-section-id="o0dcx3" data-start="1932" data-end="2066">Strong familiarity with LLM post-training methods</li> <li data-section-id="1oimshc" data-start="2067" data-end="2199">Experience designing and running rigorous ML experiments, including ablations, baselines, evaluation design, and failure analysis.</li> <li data-section-id="flbc6g" data-start="2200" data-end="2283">Expertise with Python and at least one major ML framework such as PyTorch or JAX.</li> <li data-section-id="fpdhga" data-start="2284" data-end="2409">Ability to work independently on open-ended research problems and turn ambiguous ideas into concrete experimental programs.</li> </ul> <h2 data-section-id="17hey2t" data-start="2411" data-end="2426">Nice to have</h2> <ul data-start="2428" data-end="3227"> <li data-section-id="65jwjq" data-start="2428" data-end="2608">Experience with mechanistic interpretability techniques such as activation patching, probing, sparse autoencoders, feature attribution</li> <li data-section-id="71xo1j" data-start="2609" data-end="2728">Experience training or evaluating language-model agents in interactive, tool-using, or multi-step reasoning settings.</li> <li data-section-id="w0d33i" data-start="2729" data-end="2856">Familiarity with scalable RL infrastructure, distributed training, experiment tracking, and large-scale evaluation pipelines.</li> <li data-section-id="1x11tys" data-start="2857" data-end="2968">Experience developing reward models, verifiers, process supervision methods, or automated evaluation systems.</li> <li data-section-id="16kmbff" data-start="2969" data-end="3110">Demonstrated software engineering ability, especially in research codebases that require reliability, reproducibility, and iteration speed.</li> <li data-section-id="1213hs4" data-start="3111" data-end="3227">Ability to present technical results and their strategic implications to both research and non-research audiences.</li> </ul> <h2><strong>Role specific location policy</strong></h2> <ul> <li>This role is based in our San Francisco office; for exceptional candidates we are willing to consider a hybrid arrangement</li> </ul> <h2>Compensation</h2> <p>The expected salary range for this position is $300,000 - $500,000 USD</p>

753,000+ hidden jobs like this

vmax and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime