Back to all jobs
E
Principal Platform Engineer
edisonscientific
San Francisco3mo ago
- Seniority
- Staff
About the role
<h3><strong>About</strong></h3>
<p>Edison Scientific builds and commercializes AI agents for science. Scientific discovery moves too slowly, and autonomous AI agents are how we intend to fix that. We're assembling a team of top researchers and engineers across AI and biology to build an AI scientist.</p>
<hr>
<h3><strong>Role</strong></h3>
<p>As a <strong>Principal Platform Engineer,</strong> you'll play a key role in designing, scaling, and operating the core platform infrastructure that powers autonomous scientific discovery. Your primary focus will be the orchestration for our agents at scale — building and managing clusters that orchestrate thousands of persistent, stateful workloads, developing custom resource definitions (CRDs) and operators, and ensuring the reliability and efficiency of our compute layer at scale.</p>
<p>Our mission is to build an AI scientist, and you'll own the infrastructure foundation it runs on. AI agents performing long-running scientific research demand resilient scheduling, lifecycle management, and resource orchestration far beyond typical cloud-native workloads. This role will influence platform architecture, establish infrastructure best practices, and partner closely with backend engineers, ML engineers, and researchers to deliver a production-grade environment that lets science move faster.</p>
<p>At Edison Scientific, engineering at the senior level is about technical ownership and leverage- understanding how complex systems interact, making sound architectural tradeoffs, and building foundations that allow teams and science to move faster.</p>
<p>This role is on-site at our San Francisco office in the Dogpatch neighborhood. Our office is a converted warehouse with high ceilings, open space, and a team that genuinely believes in what they're building.</p>
<p>This position is part of the Platform team. </p>
<hr>
<h3><strong>Responsibilities</strong></h3>
<ul>
<li>Architect, implement, and operate Kubernetes clusters that support thousands of concurrent, persistent resources (agents, jobs, services) with high availability and efficient resource utilization.</li>
<li>Design and develop custom resource definitions (CRDs) and Kubernetes operators to model and manage domain-specific workloads such as AI agent lifecycles, research pipelines, and long-running compute tasks.</li>
<li>Drive the strategy for cluster scaling, node pool management, autoscaling policies, and resource quota frameworks to handle rapid workload growth.</li>
<li>Build and maintain infrastructure-as-code (Terraform, Pulumi, or similar) for reproducible, version-controlled environment management.</li>
<li>Design and implement robust scheduling, placement, and affinity strategies to optimize cost, performance, and fault tolerance for heterogeneous workloads (CPU, GPU, memory-intensive).</li>
<li>Establish and uphold best practices around observability, monitoring, alerting, and incident response for infrastructure systems (Prometheus, Grafana, Datadog, or similar).</li>
<li>Own storage and networking strategy within Kubernetes — including persistent volume management, CSI drivers, service mesh, network policies, and ingress architecture.</li>
<li>Troubleshoot complex, cross-system infrastructure issues and guide others through effective debugging and remediation in distributed environments.</li>
<li>Collaborate closely with backend, ML, and research teams to understand workload requirements and translate them into reliable infrastructure patterns.</li>
</ul>
<hr>
<h3><strong>Qualifications</strong></h3>
<ul>
<li>5+ years of professional infrastructure or platform engineering experience, with deep hands-on Kubernetes expertise in production environments.</li>
<li>Experience designing and implementing custom resource definitions (CRDs) and Kubernetes operators (using frameworks such as Kubebuilder, Operator SDK, or controller-runtime).</li>
<li>Track record of operating and scaling Kubernetes clusters supporting thousands of persistent or long-lived resources (stateful workloads, persistent pods, long-running jobs).</li>
<li>Deep understanding of Kubernetes internals — API server, etcd, scheduler, controller manager, kubelet — and how they behave at scale.</li>
<li>Expertise with cloud infrastructure (AWS EKS, GCP GKE, or Azure AKS) and associated networking, storage, and IAM primitives.</li>
<li>Proficiency in at least one systems or backend language for operator development and infrastructure tooling.</li>
<li>Hands-on experience with infrastructure-as-code tools (Terraform, Pulumi, or Crossplane) and GitOps workflows.</li>
<li>Strong working knowledge of container networking (CNI plugins, service mesh, network policies), storage (CSI, persistent volumes, StatefulSets), and security (RBAC, Pod Security Standards, secrets management).</li>
<li>Ability to operate autonomously, make sound technical judgments, and drive projects from concept through production.</li>
</ul>
<h3><strong>Bonus points for:</strong></h3>
<ul>
<li>Experience with data-intensive platforms, scientific computing, or ML/AI infrastructure.</li>
<li>Prior experience in startups or small teams with significant architectural ownership and ambiguity.</li>
<li>Experience scaling systems, teams, or platforms through periods of rapid growth.</li>
</ul>
<hr>
<h3><strong>Salary</strong></h3>
<p>$200,000 - $350,000 • Offers equity</p>
<hr>
<h3><strong>Why join us?</strong></h3>
<ul>
<li>Competitive salary and equity</li>
<li>Full healthcare coverage — we pay 100% of premiums for you and your dependents</li>
<li>Support for growing families, including a yearly new parent stipend and fertility coverage through Carrot</li>
<li>401(k) company matching</li>
<li>$300 health and wellness benefit</li>
<li>Lunch is on us every day you're in the office, and dinner is on us when you're working late</li>
<li>Regular team offsites and company events</li>
<li>A fast-moving, mission-driven culture where smart people do their best work and actually enjoy doing it</li>
</ul>
<p> </p>
Perks & benefits
- 401k
- Medical Insurance
- Pension Matching
- Equity Compensation
731,000+ hidden jobs like this
edisonscientific and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites