Back to all jobs
V
Member of Technical Staff - Applied RL
vmax
San Francisco4w ago
- Seniority
- Staff
About the role
<h2><strong>About <em>V<sub>max</sub></em></strong></h2>
<p><em>V<sub>max</sub></em> is an applied research lab developing AI capable of open-ended learning. We are building systems to exceed humans in all capacities by optimising beyond the local maxima of learning from human expertise.</p>
<h2>About the role</h2>
<p>This role is for exceptional ML engineers who can turn RL research ideas into working training systems, evals, environment and rewards. You will work across research and engineering to make post-training methods reliable, measurable, and fast to iterate on.</p>
<h2 data-section-id="r8dte7" data-start="1632" data-end="1651">Responsibilities</h2>
<ul data-start="1653" data-end="3025">
<li data-section-id="du7wfs" data-start="1653" data-end="1771">Build and improve RL training pipelines for language model based agents.</li>
<li data-section-id="5k5nbp" data-start="1772" data-end="1936">Translate research ideas into working implementations, including reward functions, verifiers, environment interfaces, rollout pipelines, and evaluation harnesses.</li>
<li data-section-id="1r4u4z9" data-start="1937" data-end="2075">Design experiments that test whether RL methods are actually improving model behavior, sample efficiency, robustness, or generalization.</li>
<li data-section-id="1yntypt" data-start="2076" data-end="2226">Create quality monitoring tools for RL experiments, including regression tests, eval suites, and reward-hacking checks.</li>
<li data-section-id="wg6txu" data-start="2227" data-end="2409">Debug unstable training runs, diagnose poor learning dynamics, and identify whether failures come from algorithms, rewards, data, infrastructure, or evals.</li>
<li data-section-id="ax0sp1" data-start="2410" data-end="2519">Build 0→1 systems for new RL workflows, then harden them into reusable infrastructure.</li>
<li data-section-id="hhuz8p" data-start="2520" data-end="2612">Improve the reliability, reproducibility, and speed of experimentation across RL projects.</li>
<li data-section-id="2sqe5" data-start="2779" data-end="2899">Own technically ambiguous projects end to end, from problem framing through implementation, evaluation, and iteration.</li>
</ul>
<h2 data-section-id="w1j6vz" data-start="3027" data-end="3050">Minimum Requirements</h2>
<ul data-start="3052" data-end="4118">
<li data-section-id="1jnffvp" data-start="3052" data-end="3238">Strong practical ML engineering ability, demonstrated through shipped systems, open-source projects, competitions, independent projects, or equivalent experience.</li>
<li data-section-id="u8wgbg" data-start="3239" data-end="3317">Hands-on experience building, training, evaluating, or debugging ML systems.</li>
<li data-section-id="u0z0yw" data-start="3318" data-end="3432">Strong programming ability in Python and experience with at least one major ML framework such as PyTorch or JAX.</li>
<li data-section-id="1n4m03k" data-start="3433" data-end="3544">Working understanding of reinforcement learning, supervised learning, optimization, and modern deep learning.</li>
<li data-section-id="625i5r" data-start="3545" data-end="3649">Ability to independently take an ambiguous technical problem and drive it to a working implementation..</li>
<li data-section-id="1g9kmpk" data-start="3776" data-end="3871">Ability to collaborate closely with researchers while maintaining high engineering standards.</li>
<li data-section-id="fwg7ax" data-start="3872" data-end="3978">Experience building systems that are reliable, maintainable, and usable by other technical team members.</li>
<li data-section-id="8i8v1l" data-start="3979" data-end="4086">Clear written and verbal communication.</li>
</ul>
<h2 data-section-id="17hey2t" data-start="9410" data-end="9425">Nice to have</h2>
<ul data-start="9427" data-end="10566">
<li data-section-id="1a9mbmz" data-start="9784" data-end="9847">Experience supporting research teams or fast-moving ML teams.</li>
<li data-section-id="re633z" data-start="9848" data-end="9975">Expertise in building experiment tracking, evaluation platforms, dataset/versioning systems, or reproducibility infrastructure.</li>
<li data-section-id="ibmzoi" data-start="10091" data-end="10203">Experience at a high engineering bar organization where reliability, ownership, and code quality were central.</li>
<li data-section-id="1t8b5pk" data-start="10462" data-end="10566">Experience reducing operational complexity in systems that had become brittle, slow, or hard to debug.</li>
</ul>
<h2><strong>Role specific location policy</strong></h2>
<ul>
<li>This role is based in our San Francisco office; For exceptional candidates we are willing to consider a hybrid arrangement</li>
</ul>
<h2>Compensation</h2>
<p>The expected salary range for this position is $300,000 - $500,000 USD</p>
753,000+ hidden jobs like this
vmax and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites