Back to all jobs
T
Senior AI/ML Engineer: LLM & Agent Stack
TrueFoundry
BengaluruHybrid1mo ago
- Seniority
- Senior
About the role
<p><strong>About TrueFoundry:<br></strong></p>
<p>Every production AI system, whether it's powering customer support, writing code, analyzing financial data, or diagnosing medical conditions, needs the same foundational infrastructure.</p>
<p>A way to route between models. A way to manage tools and integrate them securely. A way to orchestrate agents and enforce governance. A unified compute layer to run it all.</p>
<p><strong>That infrastructure layer is being built right now.</strong></p>
<p>We're TrueFoundry, and we're building it. We're looking for a Senior AI/ML Engineer: LLM & Agent Stack to join the team.</p>
<h2><strong>The Problem We're Solving</strong></h2>
<p>Companies are moving beyond simple chatbots to production agentic systems. These systems route between OpenAI, Anthropic, Google, and self-hosted models. They integrate dozens of tools via protocols like MCP. They orchestrate multi-agent workflows where agents coordinate with other agents.</p>
<p>The infrastructure to support this doesn't exist yet. You can't just duct-tape together a few API calls and call it production-ready.</p>
<p>You need a control plane that handles:</p>
<ul>
<li>Intelligent routing with observability, cost policies, and fallback logic</li>
<li>Centralized tool and MCP server management with security and lifecycle controls</li>
<li>Agent orchestration with governance and guardrails</li>
<li>A unified compute layer to run self-hosted models, custom tools, and agents</li>
</ul>
<p>We've built two products to solve this:</p>
<p><strong>AI Gateway</strong> is the control plane, a five-composable components (Prompts, LLM Gateway, MCP Gateway, Guardrails, Agent Gateway) that handle routing, orchestration, and governance.</p>
<p><strong>AI Deploy</strong> is the compute layer, a Kubernetes-based platform that abstracts ML workloads as standard software primitives, so everything runs on unified infrastructure.</p>
<p>We're Series A, backed by Intel Capital and Sequoia. Companies like CVS, Mastercard, Siemens, Paytm, Synopsys, and Zscaler run production AI workloads on our platform.</p>
<p><strong><br></strong><strong data-renderer-mark="true">Role summary</strong></p>
<p data-renderer-start-pos="3502">You’ll design and own core components that enable enterprise customers to run production agentic AI safely and efficiently on TrueFoundry. This includes building robust orchestration for multi-step agents (graph/stateful workflows), model/routing logic, observability and policy enforcement (cost, data residency, rate limiting), and integrating upstream tooling like LangGraph, LangChain, vector stores, and specialized LLM runtimes.</p>
<h2 id="What-you’ll-do.1" data-renderer-start-pos="3938"><strong data-renderer-mark="true">What you’ll do</strong></h2>
<ul class="ak-ul" data-indent-level="1">
<li>
<p data-renderer-start-pos="3956">Architect and implement scalable agent orchestration patterns (graph-based executors, state management, multi-agent coordination) for production workloads.</p>
</li>
<li>
<p data-renderer-start-pos="4115">Own critical integrations: model adapters, LLM gateway hooks, vector DBs, tools & external APIs, and the platform’s LLMops flows.</p>
</li>
<li>
<p data-renderer-start-pos="4248">Build and improve tracing, benchmarking and observability for LLMs and agents, token/cost accounting, latency p95, throughput, and correctness checks.</p>
</li>
<li>
<p data-renderer-start-pos="4403">Drive design for safety/guardrails: moderation hooks, human-in-the-loop checkpoints, replayable audit trails and policy enforcement.</p>
</li>
<li>
<p data-renderer-start-pos="4539">Mentor junior engineers, run design reviews, and improve engineering practices (testing, CI/CD, chaos testing for agents).</p>
</li>
<li>
<p data-renderer-start-pos="4665">Work directly with strategic customers to prototype complex agentic solutions and translate them into product features.</p>
</li>
</ul>
<h2 id="Must-have.1" data-renderer-start-pos="4788"><strong data-renderer-mark="true">Must-have</strong></h2>
<ul class="ak-ul" data-indent-level="1">
<li>
<p data-renderer-start-pos="4801">4–10 years of software engineering with substantial experience building distributed systems, infra, or ML platforms.</p>
</li>
<li>
<p data-renderer-start-pos="4920">Deep practical experience integrating and deploying LLMs in production (RAG, retrieval, embeddings pipelines).</p>
</li>
<li>
<p data-renderer-start-pos="5034">Hands-on experience with agent orchestration frameworks (LangGraph / LangChain or custom agent runtimes) and stateful workflow design.</p>
</li>
<li>
<p data-renderer-start-pos="5278">Proven track record building observability, cost controls, and policy enforcement for production services.</p>
</li>
</ul>
<h2 id="Preferred-/-differentiators" data-renderer-start-pos="5388"><strong data-renderer-mark="true">Preferred / differentiators</strong></h2>
<ul class="ak-ul" data-indent-level="1">
<li>
<p data-renderer-start-pos="5419">Experience building or contributing to open-source LLM orchestration tools (LangGraph, LangChain, or similar).</p>
</li>
<li>
<p data-renderer-start-pos="5533">Familiarity with enterprise constraints: on-prem/cloud hybrid deployments, data residency, compliance requirements.</p>
</li>
<li>
<p data-renderer-start-pos="5652">Background in security, privacy, or model governance for LLMs.</p>
</li>
<li>
<p data-renderer-start-pos="5718">Demonstrated leadership in cross-functional projects and direct customer engagement.</p>
</li>
</ul>
<h2 id="Qualifications-&-signals-we-like.1" data-renderer-start-pos="5806"><strong data-renderer-mark="true">Qualifications & signals we like</strong></h2>
<ul class="ak-ul" data-indent-level="1">
<li>
<p data-renderer-start-pos="5895">Open-source contributions, architecture blogs, or public talks on agentic LLMs or LLMops.</p>
</li>
<li>
<p data-renderer-start-pos="5988">Examples of productizing research or shipping complex infra features.</p>
</li>
</ul>
731,000+ hidden jobs like this
TrueFoundry and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites