Member of Technical Staff, Vision / Language

XDOF

San FranciscoHybrid2w ago

Apply

Employment: Full-time
Seniority: Staff

About the role

About XDOF

Frontier labs are racing to build general-purpose robots, and the bottleneck isn't compute. It's data. At XDOF, we're building the foundation behind the foundation models: the data collection systems, annotation pipelines, exabyte-scale data infrastructure, and software toolchain that enable our partners to push the field forward.

We're hiring a Research Engineer / Scientist to help lead technical efforts at the intersection of vision-language models and robot learning. You will build systems that turn raw egocentric and teleoperation video into high-signal training data for VLA models, and increasingly, contribute to the models themselves.

Beyond pipelines, you will drive research into what makes robot data useful: discovering new metadata (contact events, affordance labels, implicit reward signals, dynamics priors from video) that unlock capabilities current approaches miss. You'll explore how structured annotations can improve cross-embodiment transfer, automatic curriculum generation, and world models that predict what actually matters for manipulation. The data layer isn't downstream of the research. It is the research.

What You'll Do

Design and implement vision-language pipelines for egocentric and teleoperation video: structured captioning, temporal grounding, action-conditioned scene understanding, and semantic annotation at scale
Develop and evaluate representations that bridge visual perception, language, and low-level robot action — spanning VLAs, video prediction, and world models
Build and improve data curation systems that assess quality, diversity, and coverage of large-scale robot demonstration datasets
Work hands-on with bimanual and high-DoF manipulation data, including real teleoperation footage and sim-generated rollouts
Collaborate directly with partner labs to define data requirements and close the loop between data quality and downstream policy performance
Stay current on the research frontier (VLAs, video foundation models, flow matching, DiT architectures, egocentric pretraining) and translate insights into production systems

Required:

MS or PhD in Computer Science, Robotics, Machine Learning, or a related field from a top-tier program
3–7 years of research or applied research experience (industry or academic) in one or more of: vision-language models, video understanding, robot learning, or generative modeling
Deep fluency in PyTorch; working knowledge of large-scale training infrastructure (distributed training, mixed precision, large batch workflows)
Published work or demonstrable impact in VLMs/VLAs, video representation learning, imitation learning, or a closely related area
Strong engineering fundamentals — you can design clean systems, not just run experiments

Benefits

Competitive compensation and equity
Comprehensive health and wellness benefits
Flexible work arrangements
Collaborative and fast-paced work environment
Opportunity to shape the future of robotics and AI alongside an ambitious, values-driven team

Level: Mid Level to Senior Research Scientist (L4–L5 equivalent) Location: San Mateo

Note: Junior candidates will still be considered

If you’re excited to help build the infrastructure powering tomorrow’s intelligent machines, we’d love to hear from you!

Perks & benefits

Equity Compensation

755,000+ hidden jobs like this

XDOF and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime