Back to all jobs
XDOF logo

Member of Technical Staff, Vision / Language

XDOF
San FranciscoHybrid2w ago
Employment
Full-time
Seniority
Staff

About the role

About XDOF

Frontier labs are racing to build general-purpose robots, and the bottleneck isn't compute. It's data. At XDOF, we're building the foundation behind the foundation models: the data collection systems, annotation pipelines, exabyte-scale data infrastructure, and software toolchain that enable our partners to push the field forward.

We're hiring a Research Engineer / Scientist to help lead technical efforts at the intersection of vision-language models and robot learning. You will build systems that turn raw egocentric and teleoperation video into high-signal training data for VLA models, and increasingly, contribute to the models themselves.

Beyond pipelines, you will drive research into what makes robot data useful: discovering new metadata (contact events, affordance labels, implicit reward signals, dynamics priors from video) that unlock capabilities current approaches miss. You'll explore how structured annotations can improve cross-embodiment transfer, automatic curriculum generation, and world models that predict what actually matters for manipulation. The data layer isn't downstream of the research. It is the research.

What You'll Do

  • Design and implement vision-language pipelines for egocentric and teleoperation video: structured captioning, temporal grounding, action-conditioned scene understanding, and semantic annotation at scale

  • Develop and evaluate representations that bridge visual perception, language, and low-level robot action — spanning VLAs, video prediction, and world models

  • Build and improve data curation systems that assess quality, diversity, and coverage of large-scale robot demonstration datasets

  • Work hands-on with bimanual and high-DoF manipulation data, including real teleoperation footage and sim-generated rollouts

  • Collaborate directly with partner labs to define data requirements and close the loop between data quality and downstream policy performance

  • Stay current on the research frontier (VLAs, video foundation models, flow matching, DiT architectures, egocentric pretraining) and translate insights into production systems

Required:

  • MS or PhD in Computer Science, Robotics, Machine Learning, or a related field from a top-tier program

  • 3–7 years of research or applied research experience (industry or academic) in one or more of: vision-language models, video understanding, robot learning, or generative modeling

  • Deep fluency in PyTorch; working knowledge of large-scale training infrastructure (distributed training, mixed precision, large batch workflows)

  • Published work or demonstrable impact in VLMs/VLAs, video representation learning, imitation learning, or a closely related area

  • Strong engineering fundamentals — you can design clean systems, not just run experiments

Benefits

  • Competitive compensation and equity

  • Comprehensive health and wellness benefits

  • Flexible work arrangements

  • Collaborative and fast-paced work environment

  • Opportunity to shape the future of robotics and AI alongside an ambitious, values-driven team

Level: Mid Level to Senior Research Scientist (L4–L5 equivalent) Location: San Mateo

Note: Junior candidates will still be considered

If you’re excited to help build the infrastructure powering tomorrow’s intelligent machines, we’d love to hear from you!

Perks & benefits

  • Equity Compensation

755,000+ hidden jobs like this

XDOF and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.