
- Employment
- Full-time
- Seniority
- Staff
About the role
Introducing Moonlake, AI for creating world simulations.
Scope of Work
Training efficiency
Dataloaders, fusion, activation remat, gradient checkpointing.
FSDP/ZeRO/tensor+pipeline parallel; NCCL tuning.
GPU + kernel performance
Nsight profiling, Triton/CUDA kernels, fused ops.
Flash-attention–style speedups, sequence packing, KV-cache tricks.
Inference optimization
Low-latency serving, continuous batching, speculative decoding.
Quantization (GPTQ/AWQ), distillation, pruning.
Infra + reliability
SLURM/K8s multi-node jobs, checkpoint hygiene.
Determinism, env pinning, GPU failure handling.
We are committed to being an on-site, in-person team currently based in San Mateo
747,000+ hidden jobs like this
Embedding VC and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites