Back to all jobs
Guild.ai logo

Engineer, Production Engineering

Guild.ai
San FranciscoOn-site
Employment
Full-time

About the role

Engineer — Production Engineering

Location: San Francisco Bay Area (Hybrid/Onsite)
Type: Full-time
Stage: Early-stage startup

About the Role

We are building the control plane for AI agents in teams and companies.

As a Production Engineer, you will own the infrastructure, security, and compliance systems that allow our platform to ship fast and run reliably at scale. This is not a traditional ops role — you will write real code, contribute directly to the product, and own the full security and compliance surface of an early-stage company.

You'll work across Kubernetes infrastructure, cloud delivery, agent sandboxing, SOC2 compliance, IT systems, and production observability — and you'll contribute to the product itself, building security-sensitive features and auditing application code for vulnerabilities.

If you want to own the production backbone for the agent-native era — from a Terraform module to a pentest to an API key implementation — we want to talk.

What You'll Own

1. Cloud & Kubernetes Infrastructure

  • Our Stack: Manage and evolve our production and staging infrastructure on GCP (GKE) using Terraform. Own DNS, networking, and environment configuration end-to-end.

  • Customer Environments: Deploy and operate within customer VPCs across AWS, Azure, and GCP — adapting to varied infrastructure constraints, security requirements, and enterprise networking configurations.

  • Agent Sandboxing: Build and maintain Kubernetes-based sandboxing for agent execution — ensuring agents operate within strict network boundaries and must route through our API gateway rather than having unfettered internet access.

  • Observability: Own our observability stack, including OpenTelemetry instrumentation and integrations with New Relic and Splunk, to give the team deep visibility into system performance and agent runtime behavior.

2. Security, Compliance & IT

  • SOC2 & Audits: Lead infrastructure and operational work to support SOC2 compliance, including audit preparation, evidence collection, and control implementation.

  • Penetration Testing & Bug Bounty: Manage our HackerOne engagement — coordinating pentests, triaging incoming bug bounty reports, and driving remediation.

  • Product Security: Audit application code for security vulnerabilities, contribute security-sensitive product features (e.g., API key management), and ensure product and infrastructure security are coherent end-to-end.

  • IT & Identity: Own our IT stack — Okta, device management, and access controls — keeping the company secure as we scale.

3. CI/CD & Progressive Delivery

  • Deployment Pipelines: Design and maintain safe, automated CI/CD workflows supporting rollout strategies like canary and blue-green deployments.

  • Release Velocity: Make shipping to production a routine, boring, highly automated non-event.

What We're Looking For

Strong Fit

  • Experience: 5+ years in Production Engineering, Platform Engineering, or a security-focused infrastructure role, ideally at a fast-growing startup or SaaS company.

  • Our Stack: Strong hands-on experience with Kubernetes and GCP in production; comfortable with Terraform for managing real infrastructure.

  • Code over Click: Strong programming skills (Python, Go, TypeScript, etc.) with a passion for automating away toil.

  • Security Depth: Hands-on experience with compliance frameworks (SOC2), vulnerability management, and secure system design.

Bonus Points

  • Background with multi-tenant SaaS or enterprise security and procurement requirements.

  • Exposure to AI/ML infrastructure, particularly agent runtimes.

  • Experience building security-sensitive product features alongside infrastructure work.

  • Experience supporting pentests / bug bounties

  • Experience deploying and operating in customer VPCs or other external cloud environments across AWS, Azure, and/or GCP — navigating enterprise networking, security, and access constraints.

Why This Role is Unique

  • Broad Ownership: You'll own the full security and compliance surface of an early-stage company — from SOC2 to sandboxed agent execution to IT — while also contributing directly to the product.

  • Agent Infrastructure: You'll design infrastructure for autonomous AI agents, not just traditional web services — introducing unique sandboxing, observability, and security challenges.

  • Our Infra and Theirs: You'll operate across both our own production environment and customer cloud environments, requiring you to be fluent across AWS, Azure, and GCP.

  • High Autonomy: As an early hire, you'll have a seat at the table to choose the tools and define the architecture that carries us to scale.

Who Thrives Here

  • Engineers who are as comfortable reading application code for vulnerabilities as they are writing a Terraform module.

  • People who enjoy owning the full security and compliance surface, not just one layer of it.

  • Builders who can navigate the constraints of customer enterprise environments without losing velocity.

  • Those who are energized — not overwhelmed — by the breadth of an early-stage technical operations role.

731,000+ hidden jobs like this

Guild.ai and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.