Back to all jobs
I
AI Engineer
intandem
Worldwide$100k–135kRemote2d ago
- Employment
- Permanent Full Time
About the role
What you will accomplish:
- Run the inference serving layer on our own GPU hardware: choose and tune the serving stack (vLLM, SGLang, TensorRT-LLM) for high throughput and low latency.
- Optimize aggressively: tensor parallelism, quantization (FP8, AWQ, GPTQ), KV-cache and prefix caching, continuous batching, speculative decoding, concurrency tuning.
- Serve multiple models and features off shared hardware: multi-LoRA, routing, and request scheduling that balances internal workloads against latency-sensitive product traffic.
- Make our AI workloads efficient: improve latency, throughput, and GPU utilization so we get the most out of what we run.
- Build the visibility: instrument performance and usage across our AI surfaces so there's clear data on how everything is running.
- Surface the technical tradeoffs (performance, latency, efficiency) so the people making the calls have what they need to make them.
- Ship the in-app agent layer that helps families coordinate: proactive nudges, smart suggestions, agents that summarize, draft, schedule, and act for busy parents.
- Build the substrate underneath: tools, memory, orchestration, guardrails, and evaluation harnesses, integrated cleanly with production APIs alongside our architecture team.
- Work in nimble pairs with feature owners, standing up whatever's needed to test an idea, including a vibe-coded UI when that's the fastest path to a real customer. Ship rough, learn fast, harden what works.
Who you are:
- Technical and hands-on with infrastructure: you like running real systems on real hardware and keeping them fast and reliable.
- A full-stack builder who wants the app layer too: you don't want to be boxed into infra. When a feature needs shipping, you want to pick it up and ship it, not just hand it off.
- Performance-minded: you treat latency, throughput, and efficiency as things to engineer deliberately.
- Rapid-prototyping and AI-first, with modern tooling (Claude Code, agent SDKs) part of your craft.
- Motivated by work that matters. Families rely on these products during real moments in their lives.
What you bring:
- 5+ years shipping production software, including meaningful applied AI or ML work.
- Demonstrated experience running and optimizing self-hosted LLMs on dedicated multi-GPU hardware: a serving stack (vLLM, SGLang, or TensorRT-LLM) and the optimization that comes with it (tensor parallelism, quantization, batching, KV cache).
- A track record of optimizing inference performance and efficiency (latency, throughput, GPU utilization).
- Strong Python and engineering fundamentals, with the full-stack range to stand up a quick UI, and the genuine desire to work app-layer features and not only infra.
- Hands-on with agent frameworks (Claude Agent SDK, LangGraph, or similar), LLM APIs, embeddings, and RAG.
- Comfortable with AWS and the devops this role owns: Docker, CI/CD, monitoring, and observability.
- Experience building internal tooling or platforms others depend on. Bonus for Slack apps, MCP, or agent orchestration at team scale.
- Medical: In Tandem pays 100% of the premium for employees AND 99% for all additional family members
- 401k: Up to a 4% match with immediate vesting
- Paid leave for all new parents
- Learning & Development stipend for employees
- Paid Time Off: 11 Holidays + Winter Break (3 Days) + Volunteer Time Off (1 Day) + Floating Holiday (1 Day)
- Personal Time Off: 15 days for 0-1 years of employment, 20 days 1-3 years of employment
- Supportive and flexible working environment – work from anywhere!
Perks & benefits
- 401k
- Paid Time Off
731,000+ hidden jobs like this
intandem and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites