Member of Technical Staff (MTS) - Multimodal Foundation Models

Deeproute.ai

FremontOn-site1w ago

Seniority: Staff

About the role

Focus

Multimodal Foundation Models · Representation Learning · Method Innovation

We are looking for strong technical builders and researchers who deeply understand foundation models and representation learning beyond simply applying existing frameworks.

Ideal candidates should have:

Strong experimental rigor
Solid systems and modeling intuition
Hands-on engineering ability
Interest in scalable multimodal AI systems for real-world autonomy

We value people who can bridge research and production, and who care about robustness, scalability, efficiency, and practical deployment in large-scale autonomous driving systems.

Responsibilities

1. Large-Scale Foundation Model Pretraining

Develop scalable pretraining pipelines for large-scale multimodal driving data
Design and optimize training strategies for:

Vision-language-action models
Video foundation models
Long-context temporal modeling
Multimodal representation alignment

Improve:

Training stability
Data efficiency
Scaling efficiency
Representation robustness

Work on distributed training systems and large-scale model optimization using frameworks such as:

PyTorch Distributed
DeepSpeed
Megatron-LM

2. Representation Learning & Method Innovation

Design and improve self-supervised and multimodal learning methods for real-world autonomous driving systems
Conduct architecture-level research on:

Vision Transformers (ViT)
Video / temporal architectures
Multimodal fusion and alignment
Embedding and retrieval systems
Long-context and memory-efficient architectures

Explore and improve:

Pretraining objectives
Loss functions
Training paradigms
Generalization and robustness

Analyze model behavior through:

Rigorous ablation studies
Failure case analysis

Representation probing and evaluation

3. Efficient Foundation Models & Scalable Deployment

Improve the efficiency, scalability, and deployability of large multimodal foundation models for real-world autonomous driving systems
Work on areas such as:

Model quantization
Knowledge distillation
Efficient attention mechanisms
Sparse architectures and Mixture-of-Experts (MoE)
Long-context and memory-efficient modeling
Inference acceleration and serving optimization
Training and inference system efficiency

Optimize model throughput, latency, memory usage, and deployment performance for large-scale production environments

Requirements

MS or PhD in:

Computer Vision
Machine Learning
Robotics
Computer Science
Related fields

Strong understanding of:

Foundation models
Self-supervised learning
Representation learning
Multimodal learning
Large-scale pretraining

Hands-on experience with methods such as:

CLIP
DINO / DINOv2
MAE
Contrastive learning
Masked modeling
MoE or scalable transformer architectures

Experience with one or more of the following is highly valued:

Video foundation models
Long-context modeling
Retrieval systems
Efficient inference
Distributed training
Model compression and deployment optimization

Strong publication record in top-tier venues is preferred:

CVPR
ICCV
ECCV
NeurIPS
ICLR
ICML

756,000+ hidden jobs like this

Deeproute.ai and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime