Back to all jobs
C

Founding Engineer, AI Infra

Cox Exponential

San Francisco Bay AreaHybrid1w ago
Employment
Full-time

About the role

About Goaly

About the Role

Key Responsibilities

  • Efficiency & performance: Improve LLM training and inference efficiency through better memory utilization, optimized parallelism, and kernel-level innovations (e.g. FlashAttention, CUDA/Triton).
  • Training & RL robustness: Build scalable, stable training and RL pipelines with strong reproducibility, observability, and debuggability.
  • Serving & inference optimization: Design and tune high-throughput, low-latency model serving systems, including quantization, caching, and speculative decoding.
  • Scalability & infrastructure: Own end-to-end training and inference infrastructure — from data ingestion and checkpointing to multi-GPU and multi-cloud orchestration.
  • Production enablement: Work closely with researchers and product engineers to turn new algorithms into reliable, production-ready systems.

Requirements

  • 5+ years building or operating ML infrastructure at scale, ideally supporting large language or multimodal models.
  • Deep understanding of GPU architecture, distributed training frameworks (PyTorch, DeepSpeed, Megatron, Ray), and parallelism strategies.
  • Hands-on experience running inference stacks (vLLM / SGLang, TGI, Triton) and optimizing them via low-level profiling.
  • Strong software engineering fundamentals in Python and one of C++/Rust/Go, with clean, reliable code shipped to production.
  • Working knowledge of modern data pipelines, feature stores, and vector databases used in production AI systems.
  • Comfort automating infrastructure with Kubernetes, Terraform/Pulumi, and observability stacks (Prometheus, Grafana, OpenTelemetry).

Bonus Points

  • Experience deploying open-source LLMs (Llama 3, Qwen, DeepSeek) or training custom foundation models.
  • Contributions to ML systems tooling (compilers, kernels, inference runtimes) or open-source infrastructure projects.
  • Background in reinforcement learning, evaluation harnesses, or alignment tooling that hardens production AI systems.

764,000+ hidden jobs like this

Cox Exponential and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.