Back to all jobs
XDOF logo

Member of Technical Staff, Perception

XDOF
San FranciscoHybrid1d ago
Employment
Full-time
Seniority
Staff

About the role

At XDOF, we’re at an inflection point. Frontier labs are racing to build general-purpose robots, and high-quality training data is the bottleneck. We’re building the foundation behind the foundation models – the data collection systems, operational capability, exabyte-scale data warehouse, and software toolchain – to help our partners drive the field forward.

The Perception Algorithm team transforms raw multimodal sensor data into high-quality robot training annotations. You will be deeply involved in the complete loop from data collection to model delivery — sensor calibration, SLAM localization, human pose estimation, perception model training, and embedded deployment. Your work directly determines the quality ceiling of our training data.

Core Responsibilities

Human Pose Estimation

  • Design and optimize hand pose estimation pipelines supporting accurate joint angle extraction from teleoperation data collection

  • Build full-body pose estimation systems for motion capture and teleoperation action annotation ground truth generation

  • Research and apply vision-based pose estimation methods (markerless) to reduce data collection costs

  • Fuse pose estimation outputs with robot joint angle data to generate consistent training annotations

Robot Perception & Calibration

  • Design and maintain intrinsic/extrinsic calibration pipelines for multi-camera arrays (factory calibration + online recalibration)

  • Build visual SLAM / V-SLAM systems supporting real-time localization and scene reconstruction on data collection platforms

  • Implement hand-eye calibration between cameras and robot end-effectors

  • Develop temporal alignment solutions across multimodal sensors (cameras, IMU, data gloves, force sensors)

Perception Model Training & Deployment

  • Train and iterate on perception models including object detection, instance segmentation, and 6DoF pose estimation

  • Optimize model inference using TensorRT / CUDA for real-time performance on robot embedded platforms

  • Write custom CUDA kernels for low-level acceleration of perception tasks

  • Design evaluation metric frameworks for perception models; continuously track the relationship between model performance and data quality

End-to-End Loop from Data Collection to Model Delivery

  • Contribute to the design of automated annotation pipelines that convert sensor data into structured training labels

  • Build Auto QA modules to filter low-quality data including anomalous frames, failed demonstrations, and sensor dropouts

  • Collaborate with ML engineers and data infrastructure teams to ensure perception output formats meet downstream VLA model training requirements

  • Establish feedback mechanisms linking perception accuracy to model training outcomes, continuously improving annotation quality

Requirements

Must-Have

  • 5+ years of industry experience in robot perception or computer vision

  • Strong 3D vision fundamentals: stereo and structured-light camera principles, 3D reconstruction

  • Proficiency with SLAM frameworks (ORB-SLAM, VINS-Mono, FastLIO, etc.) or V-SLAM system development experience

  • Hands-on engineering experience with human pose estimation: hand joints (MediaPipe, MANO) or full-body pose (OpenPose, SMPLify, etc.)

  • Proficient in deep learning training frameworks for perception model training, tuning, and evaluation

  • TensorRT deployment experience with real-time inference optimization on embedded platforms (Jetson, Horizon, etc.)

  • CUDA programming fundamentals; ability to write or debug custom kernels

  • Proficient in C++ and Python with ROS / ROS2 development experience

  • Proficient with AI coding agents

Nice to Have

  • Engineering experience with 6DoF object pose estimation (FoundPose, FoundationPose, GDR-Net, etc.)

  • Familiarity with 3D Gaussian Splatting or NeRF for scene reconstruction or data augmentation

  • Experience with robot manipulation or teleoperation systems

  • End-to-end development experience with automated annotation pipelines or ground truth generation systems

  • Published research in perception, pose estimation, or robotics

What We Offer

  • Direct involvement in the most critical technical challenge in embodied intelligence: producing high-quality robot training data

  • An environment working alongside top-tier robotics engineers and ML researchers

  • Proprietary hardware platforms (humanoid robots, camera arrays, data gloves)

  • A fast-paced, high-autonomy 0→1 work environment

755,000+ hidden jobs like this

XDOF and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.