Back to all jobs
D

Member of Technical Staff (MTS) - Multimodal Foundation Models

Deeproute.ai

FremontOn-site1w ago
Seniority
Staff

About the role

Focus

Multimodal Foundation Models · Representation Learning · Method Innovation

We are looking for strong technical builders and researchers who deeply understand foundation models and representation learning beyond simply applying existing frameworks.

Ideal candidates should have:

  • Strong experimental rigor
  • Solid systems and modeling intuition
  • Hands-on engineering ability
  • Interest in scalable multimodal AI systems for real-world autonomy

We value people who can bridge research and production, and who care about robustness, scalability, efficiency, and practical deployment in large-scale autonomous driving systems.

Responsibilities

1. Large-Scale Foundation Model Pretraining

  • Develop scalable pretraining pipelines for large-scale multimodal driving data
  • Design and optimize training strategies for:
      • Vision-language-action models
      • Video foundation models
      • Long-context temporal modeling
      • Multimodal representation alignment
  • Improve:
    • Training stability
    • Data efficiency
    • Scaling efficiency
    • Representation robustness
  • Work on distributed training systems and large-scale model optimization using frameworks such as:
    • PyTorch Distributed
    • DeepSpeed
    • Megatron-LM

2. Representation Learning & Method Innovation

  • Design and improve self-supervised and multimodal learning methods for real-world autonomous driving systems
  • Conduct architecture-level research on:
    • Vision Transformers (ViT)
    • Video / temporal architectures
    • Multimodal fusion and alignment
    • Embedding and retrieval systems
    • Long-context and memory-efficient architectures
  • Explore and improve:
    • Pretraining objectives
    • Loss functions
    • Training paradigms
    • Generalization and robustness
  • Analyze model behavior through:
    • Rigorous ablation studies
    • Failure case analysis
  • Representation probing and evaluation

3. Efficient Foundation Models & Scalable Deployment

  • Improve the efficiency, scalability, and deployability of large multimodal foundation models for real-world autonomous driving systems
  • Work on areas such as:
    • Model quantization
    • Knowledge distillation
    • Efficient attention mechanisms
    • Sparse architectures and Mixture-of-Experts (MoE)
    • Long-context and memory-efficient modeling
    • Inference acceleration and serving optimization
    • Training and inference system efficiency
  • Optimize model throughput, latency, memory usage, and deployment performance for large-scale production environments

Requirements

  1. MS or PhD in:
      • Computer Vision
      • Machine Learning
      • Robotics
      • Computer Science
      • Related fields
  2. Strong understanding of:
      • Foundation models
      • Self-supervised learning
      • Representation learning
      • Multimodal learning
      • Large-scale pretraining
  3. Hands-on experience with methods such as:
      • CLIP
      • DINO / DINOv2
      • MAE
      • Contrastive learning
      • Masked modeling
      • MoE or scalable transformer architectures
  4. Experience with one or more of the following is highly valued:
      • Video foundation models
      • Long-context modeling
      • Retrieval systems
      • Efficient inference
      • Distributed training
      • Model compression and deployment optimization
  5. Strong publication record in top-tier venues is preferred:
      • CVPR
      • ICCV
      • ECCV
      • NeurIPS
      • ICLR
      • ICML

756,000+ hidden jobs like this

Deeproute.ai and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.