Back to all jobs
T

Senior AI/ML Engineer: LLM & Agent Stack

TrueFoundry

BengaluruHybrid1mo ago
Seniority
Senior

About the role

<p><strong>About TrueFoundry:<br></strong></p> <p>Every production AI system, whether it's powering customer support, writing code, analyzing financial data, or diagnosing medical conditions, needs the same foundational infrastructure.</p> <p>A way to route between models. A way to manage tools and integrate them securely. A way to orchestrate agents and enforce governance. A unified compute layer to run it all.</p> <p><strong>That infrastructure layer is being built right now.</strong></p> <p>We're TrueFoundry, and we're building it. We're looking for a Senior AI/ML Engineer: LLM &amp; Agent Stack to join the team.</p> <h2><strong>The Problem We're Solving</strong></h2> <p>Companies are moving beyond simple chatbots to production agentic systems. These systems route between OpenAI, Anthropic, Google, and self-hosted models. They integrate dozens of tools via protocols like MCP. They orchestrate multi-agent workflows where agents coordinate with other agents.</p> <p>The infrastructure to support this doesn't exist yet. You can't just duct-tape together a few API calls and call it production-ready.</p> <p>You need a control plane that handles:</p> <ul> <li>Intelligent routing with observability, cost policies, and fallback logic</li> <li>Centralized tool and MCP server management with security and lifecycle controls</li> <li>Agent orchestration with governance and guardrails</li> <li>A unified compute layer to run self-hosted models, custom tools, and agents</li> </ul> <p>We've built two products to solve this:</p> <p><strong>AI Gateway</strong> is the control plane, a five-composable components (Prompts, LLM Gateway, MCP Gateway, Guardrails, Agent Gateway) that handle routing, orchestration, and governance.</p> <p><strong>AI Deploy</strong> is the compute layer, a Kubernetes-based platform that abstracts ML workloads as standard software primitives, so everything runs on unified infrastructure.</p> <p>We're Series A, backed by Intel Capital and Sequoia. Companies like CVS, Mastercard, Siemens, Paytm, Synopsys, and Zscaler run production AI workloads on our platform.</p> <p><strong><br></strong><strong data-renderer-mark="true">Role summary</strong></p> <p data-renderer-start-pos="3502">You’ll design and own core components that enable enterprise customers to run production agentic AI safely and efficiently on TrueFoundry. This includes building robust orchestration for multi-step agents (graph/stateful workflows), model/routing logic, observability and policy enforcement (cost, data residency, rate limiting), and integrating upstream tooling like LangGraph, LangChain, vector stores, and specialized LLM runtimes.</p> <h2 id="What-you’ll-do.1" data-renderer-start-pos="3938"><strong data-renderer-mark="true">What you’ll do</strong></h2> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="3956">Architect and implement scalable agent orchestration patterns (graph-based executors, state management, multi-agent coordination) for production workloads.</p> </li> <li> <p data-renderer-start-pos="4115">Own critical integrations: model adapters, LLM gateway hooks, vector DBs, tools &amp; external APIs, and the platform’s LLMops flows.</p> </li> <li> <p data-renderer-start-pos="4248">Build and improve tracing, benchmarking and observability for LLMs and agents, token/cost accounting, latency p95, throughput, and correctness checks.</p> </li> <li> <p data-renderer-start-pos="4403">Drive design for safety/guardrails: moderation hooks, human-in-the-loop checkpoints, replayable audit trails and policy enforcement.</p> </li> <li> <p data-renderer-start-pos="4539">Mentor junior engineers, run design reviews, and improve engineering practices (testing, CI/CD, chaos testing for agents).</p> </li> <li> <p data-renderer-start-pos="4665">Work directly with strategic customers to prototype complex agentic solutions and translate them into product features.</p> </li> </ul> <h2 id="Must-have.1" data-renderer-start-pos="4788"><strong data-renderer-mark="true">Must-have</strong></h2> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="4801">4–10 years of software engineering with substantial experience building distributed systems, infra, or ML platforms.</p> </li> <li> <p data-renderer-start-pos="4920">Deep practical experience integrating and deploying LLMs in production (RAG, retrieval, embeddings pipelines).</p> </li> <li> <p data-renderer-start-pos="5034">Hands-on experience with agent orchestration frameworks (LangGraph / LangChain or custom agent runtimes) and stateful workflow design.</p> </li> <li> <p data-renderer-start-pos="5278">Proven track record building observability, cost controls, and policy enforcement for production services.</p> </li> </ul> <h2 id="Preferred-/-differentiators" data-renderer-start-pos="5388"><strong data-renderer-mark="true">Preferred / differentiators</strong></h2> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="5419">Experience building or contributing to open-source LLM orchestration tools (LangGraph, LangChain, or similar).</p> </li> <li> <p data-renderer-start-pos="5533">Familiarity with enterprise constraints: on-prem/cloud hybrid deployments, data residency, compliance requirements.</p> </li> <li> <p data-renderer-start-pos="5652">Background in security, privacy, or model governance for LLMs.</p> </li> <li> <p data-renderer-start-pos="5718">Demonstrated leadership in cross-functional projects and direct customer engagement.</p> </li> </ul> <h2 id="Qualifications-&amp;-signals-we-like.1" data-renderer-start-pos="5806"><strong data-renderer-mark="true">Qualifications &amp; signals we like</strong></h2> <ul class="ak-ul" data-indent-level="1"> <li> <p data-renderer-start-pos="5895">Open-source contributions, architecture blogs, or public talks on agentic LLMs or LLMops.</p> </li> <li> <p data-renderer-start-pos="5988">Examples of productizing research or shipping complex infra features.</p> </li> </ul>

731,000+ hidden jobs like this

TrueFoundry and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.