Back to all jobs
G

Inference

genesis

ParisHybrid
Employment
Full-time

About the role

What You’ll Do

  • Build low-latency inference pipelines for on-device deployment, enabling real-time next-token and diffusion-based control loops in robotics

  • Design and optimize distributed inference systems on GPU clusters, pushing throughput with large-batch serving and efficient resource utilization

  • Implement efficient low-level code (CUDA, Triton, custom kernels) and integrate it seamlessly into high-level frameworks

  • Optimize workloads for both throughput (batching, scheduling, quantization) and latency (caching, memory management, graph compilation)

  • Develop monitoring and debugging tools to guarantee reliability, determinism, and rapid diagnosis of regressions across both stacks

What You’ll Bring

  • Deep experience in distributed systems, ML infrastructure, or high-performance serving (8+ years)

  • Production-grade expertise in Python, with strong background in systems languages (C++/Rust/Go)

  • Low-level performance mastery: CUDA, Triton, kernel optimization, quantization, memory and compute scheduling

  • Proven track record scaling inference workloads in both throughput-oriented cluster environments and latency-critical on-device deployments

  • System-level mindset with a history of tuning hardware–software interactions for maximum efficiency, throughput, and responsiveness

731,000+ hidden jobs like this

genesis and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.