Back to all jobs
E

Principal Platform Engineer

edisonscientific

San Francisco3mo ago
Seniority
Staff

About the role

<h3><strong>About</strong></h3> <p>Edison Scientific builds and commercializes AI agents for science. Scientific discovery moves too slowly, and autonomous AI agents are how we intend to fix that. We're assembling a team of top researchers and engineers across AI and biology to build an AI scientist.</p> <hr> <h3><strong>Role</strong></h3> <p>As a <strong>Principal Platform Engineer,</strong> you'll play a key role in designing, scaling, and operating the core platform infrastructure that powers autonomous scientific discovery. Your primary focus will be the orchestration for our agents at scale — building and managing clusters that orchestrate thousands of persistent, stateful workloads, developing custom resource definitions (CRDs) and operators, and ensuring the reliability and efficiency of our compute layer at scale.</p> <p>Our mission is to build an AI scientist, and you'll own the infrastructure foundation it runs on. AI agents performing long-running scientific research demand resilient scheduling, lifecycle management, and resource orchestration far beyond typical cloud-native workloads. This role will influence platform architecture, establish infrastructure best practices, and partner closely with backend engineers, ML engineers, and researchers to deliver a production-grade environment that lets science move faster.</p> <p>At Edison Scientific, engineering at the senior level is about technical ownership and leverage- understanding how complex systems interact, making sound architectural tradeoffs, and building foundations that allow teams and science to move faster.</p> <p>This role is on-site at our San Francisco office in the Dogpatch neighborhood. Our office is a converted warehouse with high ceilings, open space, and a team that genuinely believes in what they're building.</p> <p>This position is part of the Platform team.&nbsp;</p> <hr> <h3><strong>Responsibilities</strong></h3> <ul> <li>Architect, implement, and operate Kubernetes clusters that support thousands of concurrent, persistent resources (agents, jobs, services) with high availability and efficient resource utilization.</li> <li>Design and develop custom resource definitions (CRDs) and Kubernetes operators to model and manage domain-specific workloads such as AI agent lifecycles, research pipelines, and long-running compute tasks.</li> <li>Drive the strategy for cluster scaling, node pool management, autoscaling policies, and resource quota frameworks to handle rapid workload growth.</li> <li>Build and maintain infrastructure-as-code (Terraform, Pulumi, or similar) for reproducible, version-controlled environment management.</li> <li>Design and implement robust scheduling, placement, and affinity strategies to optimize cost, performance, and fault tolerance for heterogeneous workloads (CPU, GPU, memory-intensive).</li> <li>Establish and uphold best practices around observability, monitoring, alerting, and incident response for infrastructure systems (Prometheus, Grafana, Datadog, or similar).</li> <li>Own storage and networking strategy within Kubernetes — including persistent volume management, CSI drivers, service mesh, network policies, and ingress architecture.</li> <li>Troubleshoot complex, cross-system infrastructure issues and guide others through effective debugging and remediation in distributed environments.</li> <li>Collaborate closely with backend, ML, and research teams to understand workload requirements and translate them into reliable infrastructure patterns.</li> </ul> <hr> <h3><strong>Qualifications</strong></h3> <ul> <li>5+ years of professional infrastructure or platform engineering experience, with deep hands-on Kubernetes expertise in production environments.</li> <li>Experience designing and implementing custom resource definitions (CRDs) and Kubernetes operators (using frameworks such as Kubebuilder, Operator SDK, or controller-runtime).</li> <li>Track record of operating and scaling Kubernetes clusters supporting thousands of persistent or long-lived resources (stateful workloads, persistent pods, long-running jobs).</li> <li>Deep understanding of Kubernetes internals — API server, etcd, scheduler, controller manager, kubelet — and how they behave at scale.</li> <li>Expertise with cloud infrastructure (AWS EKS, GCP GKE, or Azure AKS) and associated networking, storage, and IAM primitives.</li> <li>Proficiency in at least one systems or backend language for operator development and infrastructure tooling.</li> <li>Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, or Crossplane) and GitOps workflows.</li> <li>Strong working knowledge of container networking (CNI plugins, service mesh, network policies), storage (CSI, persistent volumes, StatefulSets), and security (RBAC, Pod Security Standards, secrets management).</li> <li>Ability to operate autonomously, make sound technical judgments, and drive projects from concept through production.</li> </ul> <h3><strong>Bonus points for:</strong></h3> <ul> <li>Experience with data-intensive platforms, scientific computing, or ML/AI infrastructure.</li> <li>Prior experience in startups or small teams with significant architectural ownership and ambiguity.</li> <li>Experience scaling systems, teams, or platforms through periods of rapid growth.</li> </ul> <hr> <h3><strong>Salary</strong></h3> <p>$200,000 - $350,000 &nbsp;•&nbsp; Offers equity</p> <hr> <h3><strong>Why join us?</strong></h3> <ul> <li>Competitive salary and equity</li> <li>Full healthcare coverage — we pay 100% of premiums for you and your dependents</li> <li>Support for growing families, including a yearly new parent stipend and fertility coverage through Carrot</li> <li>401(k) company matching</li> <li>$300 health and wellness benefit</li> <li>Lunch is on us every day you're in the office, and dinner is on us when you're working late</li> <li>Regular team offsites and company events</li> <li>A fast-moving, mission-driven culture where smart people do their best work and actually enjoy doing it</li> </ul> <p>&nbsp;</p>

Perks & benefits

  • 401k
  • Medical Insurance
  • Pension Matching
  • Equity Compensation

731,000+ hidden jobs like this

edisonscientific and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.