Senior AI/ML Engineer: LLM & Agent Stack

TrueFoundry

BengaluruHybrid1mo ago

Seniority: Senior

About the role

About TrueFoundry: Every production AI system, whether it's powering customer support, writing code, analyzing financial data, or diagnosing medical conditions, needs the same foundational infrastructure. A way to route between models. A way to manage tools and integrate them securely. A way to orchestrate agents and enforce governance. A unified compute layer to run it all. That infrastructure layer is being built right now. We're TrueFoundry, and we're building it. We're looking for a Senior AI/ML Engineer: LLM & Agent Stack to join the team. <h2>The Problem We're Solving</h2> Companies are moving beyond simple chatbots to production agentic systems. These systems route between OpenAI, Anthropic, Google, and self-hosted models. They integrate dozens of tools via protocols like MCP. They orchestrate multi-agent workflows where agents coordinate with other agents. The infrastructure to support this doesn't exist yet. You can't just duct-tape together a few API calls and call it production-ready. You need a control plane that handles: <ul> <li>Intelligent routing with observability, cost policies, and fallback logic</li> <li>Centralized tool and MCP server management with security and lifecycle controls</li> <li>Agent orchestration with governance and guardrails</li> <li>A unified compute layer to run self-hosted models, custom tools, and agents</li> </ul> We've built two products to solve this: AI Gateway is the control plane, a five-composable components (Prompts, LLM Gateway, MCP Gateway, Guardrails, Agent Gateway) that handle routing, orchestration, and governance. AI Deploy is the compute layer, a Kubernetes-based platform that abstracts ML workloads as standard software primitives, so everything runs on unified infrastructure. We're Series A, backed by Intel Capital and Sequoia. Companies like CVS, Mastercard, Siemens, Paytm, Synopsys, and Zscaler run production AI workloads on our platform. Role summary You’ll design and own core components that enable enterprise customers to run production agentic AI safely and efficiently on TrueFoundry. This includes building robust orchestration for multi-step agents (graph/stateful workflows), model/routing logic, observability and policy enforcement (cost, data residency, rate limiting), and integrating upstream tooling like LangGraph, LangChain, vector stores, and specialized LLM runtimes. <h2 id="What-you’ll-do.1" data-renderer-start-pos="3938">What you’ll do</h2> <ul class="ak-ul" data-indent-level="1"> <li> Architect and implement scalable agent orchestration patterns (graph-based executors, state management, multi-agent coordination) for production workloads. </li> <li> Own critical integrations: model adapters, LLM gateway hooks, vector DBs, tools & external APIs, and the platform’s LLMops flows. </li> <li> Build and improve tracing, benchmarking and observability for LLMs and agents, token/cost accounting, latency p95, throughput, and correctness checks. </li> <li> Drive design for safety/guardrails: moderation hooks, human-in-the-loop checkpoints, replayable audit trails and policy enforcement. </li> <li> Mentor junior engineers, run design reviews, and improve engineering practices (testing, CI/CD, chaos testing for agents). </li> <li> Work directly with strategic customers to prototype complex agentic solutions and translate them into product features. </li> </ul> <h2 id="Must-have.1" data-renderer-start-pos="4788">Must-have</h2> <ul class="ak-ul" data-indent-level="1"> <li> 4–10 years of software engineering with substantial experience building distributed systems, infra, or ML platforms. </li> <li> Deep practical experience integrating and deploying LLMs in production (RAG, retrieval, embeddings pipelines). </li> <li> Hands-on experience with agent orchestration frameworks (LangGraph / LangChain or custom agent runtimes) and stateful workflow design. </li> <li> Proven track record building observability, cost controls, and policy enforcement for production services. </li> </ul> <h2 id="Preferred-/-differentiators" data-renderer-start-pos="5388">Preferred / differentiators</h2> <ul class="ak-ul" data-indent-level="1"> <li> Experience building or contributing to open-source LLM orchestration tools (LangGraph, LangChain, or similar). </li> <li> Familiarity with enterprise constraints: on-prem/cloud hybrid deployments, data residency, compliance requirements. </li> <li> Background in security, privacy, or model governance for LLMs. </li> <li> Demonstrated leadership in cross-functional projects and direct customer engagement. </li> </ul> <h2 id="Qualifications-&-signals-we-like.1" data-renderer-start-pos="5806">Qualifications & signals we like</h2> <ul class="ak-ul" data-indent-level="1"> <li> Open-source contributions, architecture blogs, or public talks on agentic LLMs or LLMops. </li> <li> Examples of productizing research or shipping complex infra features. </li> </ul>

731,000+ hidden jobs like this

TrueFoundry and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime