Back to all jobs
Buildkite logo

Staff ML Engineer

Buildkite
ANZ Region1mo ago
Seniority
Staff

About the role

<h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold">About Buildkite</h2> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">Buildkite's CI platform is trusted by the world's leading engineering teams, shipping software to over 1,000,000,000 daily users.</p> <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold">Job Overview</h2> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">We're hiring a Staff Engineer (ML) to join our Test Engine team. In this role, you'll define and lead the technical strategy for machine learning within Test Engine β€” specifically, building the models and infrastructure behind predictive test selection: using code changes to determine which tests actually need to run.</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">Staff Engineers at Buildkite are hands-on technical leaders. You'll influence how we design, build, and scale systems while supporting other engineers to deliver their best work. You'll be the most senior ML practitioner in the company, setting the technical direction for how we approach test selection and establishing the patterns and infrastructure that the broader ML effort builds on.</p> <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold">πŸ”§ About the Team</h2> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">The Test Engine team helps engineering teams ship faster by giving them visibility and control over their test suites. Today, that means real-time flaky test detection and management, intelligent test splitting across parallel jobs, and performance analytics and tracing β€” all working across any CI/CD platform, not just Buildkite Pipelines.</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">Test Engine already ingests billions of test runs. We have deep visibility into test suites, codebases, and the relationships between them. The next step is using that data to answer a fundamental question: for a given code change, which tests are most likely to fail?</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">We believe the industry is moving away from running full test suites on every change. The teams that can shift their outer testing loop into a fast, precise inner loop β€” running only the tests that matter β€” will ship value to their customers dramatically faster. For many of our customers, that speed is existential. Switching costs are low, competition is fierce, and the teams with faster feedback loops win.</p> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">This is where ML comes in. If we can model the relationship between code changes and test failures, we can give engineering teams a fundamentally faster development cycle. We're not trying to optimise individual tests β€” we're trying to build a generalised solution to test selection that works across codebases, frameworks, and languages.</p> <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold">πŸš€ What You'll Do</h2> <h3 class="text-text-100 mt-2 -mb-1 text-base font-bold">Own Technical Direction for ML in Test Engine</h3> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2">Lead and define the ML strategy for predictive test selection β€” from early experimentation through to models running reliably in production at scale</li> <li class="whitespace-normal break-words pl-2">Lead the technical investigation into how we build a generalised test selection model, and shape the approach based on what the data tells you</li> <li class="whitespace-normal break-words pl-2">Lead the design of the ML architecture end-to-end: feature engineering from code changes and test history, model training and evaluation, serving infrastructure, and feedback loops for continuous improvement</li> <li class="whitespace-normal break-words pl-2">Drive key decisions around model operationalisation β€” latency constraints (test selection has to be fast enough to sit in the critical path), prediction accuracy trade-offs, and graceful degradation when confidence is low</li> <li class="whitespace-normal break-words pl-2">Shape how ML capabilities integrate with Test Engine's existing data infrastructure β€” billions of ingested test runs, test-to-code mapping, and the intelligent splitting engine</li> </ul> <h3 class="text-text-100 mt-2 -mb-1 text-base font-bold">Build and Scale the ML Platform</h3> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2">Build the ML platform layer so that getting a model into production is fast and repeatable</li> <li class="whitespace-normal break-words pl-2">Design, build, and maintain the data pipelines that feed ML workloads β€” connecting code change signals with test execution history at scale</li> <li class="whitespace-normal break-words pl-2">Train, evaluate, and deploy models, taking ownership through to monitoring and retraining in production</li> <li class="whitespace-normal break-words pl-2">Instrument production models with observability metrics: prediction accuracy, latency, coverage, false negative rates, and drift detection</li> <li class="whitespace-normal break-words pl-2">Solve the hardest technical challenges at the intersection of code analysis and test data β€” feature extraction from diffs, generalisation across languages and frameworks, and handling the cold-start problem for new tests and repositories</li> </ul> <h3 class="text-text-100 mt-2 -mb-1 text-base font-bold">Lead and Unblock</h3> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2">Investigate and resolve complex performance and reliability issues across the data and ML stack</li> <li class="whitespace-normal break-words pl-2">Share knowledge and drive engineering best practices across teams through documentation, mentorship, and pairing</li> <li class="whitespace-normal break-words pl-2">Support the wider engineering organisation by contributing to cross-team tooling, infrastructure, and frameworks</li> <li class="whitespace-normal break-words pl-2">Communicate trade-offs effectively and build alignment around technical decisions</li> <li class="whitespace-normal break-words pl-2">Work closely with customers to understand how test selection fits into their development workflows, and ensure the product delivers real impact</li> </ul> <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold">🎨 Skills &amp; Experience We Value</h2> <h3 class="text-text-100 mt-2 -mb-1 text-base font-bold">Technical Expertise</h3> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2">Deep proficiency in Python, with strong experience building production ML systems end-to-end</li> <li class="whitespace-normal break-words pl-2">Proven experience designing and operating ML infrastructure at scale β€” model registries, feature stores, serving layers, experiment tracking, or similar</li> <li class="whitespace-normal break-words pl-2">Strong experience with data processing at scale β€” whether batch or streaming frameworks (Spark, Flink, or similar)</li> <li class="whitespace-normal break-words pl-2">Deep proficiency in SQL</li> <li class="whitespace-normal break-words pl-2">Comfort working in cloud environments (AWS) and with containerised workloads (Docker, Kubernetes)</li> <li class="whitespace-normal break-words pl-2">In short, we'd expect equal comfort and high level capability in the end to end process from designing and building models through to deploying them.</li> </ul> <h3 class="text-text-100 mt-2 -mb-1 text-base font-bold">ML &amp; Domain Experience</h3> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2">Hands-on experience training, evaluating, and deploying ML models in production β€” you're a practitioner, not only an infrastructure builder</li> <li class="whitespace-normal break-words pl-2">Experience with classification, ranking, or prediction problems where the signal-to-noise ratio is challenging β€” test selection shares characteristics with anomaly detection, change-point detection, and predictive filtering</li> <li class="whitespace-normal break-words pl-2">Track record of building ML capabilities that scaled beyond a single use case β€” not just one-off models but repeatable, generalised approaches</li> <li class="whitespace-normal break-words pl-2">Experience with feature engineering from structured and semi-structured data (code diffs, execution logs, dependency graphs, or similar)</li> <li class="whitespace-normal break-words pl-2">Experience instrumenting production models with observability: accuracy, latency, coverage, drift</li> </ul> <h3 class="text-text-100 mt-2 -mb-1 text-base font-bold">Collaboration and Communication</h3> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2">Excellent written and verbal communication skills, especially in a remote-first environment</li> <li class="whitespace-normal break-words pl-2">Ability to distil complex technical concepts into clear explanations for diverse audiences</li> <li class="whitespace-normal break-words pl-2">A collaborative, pragmatic mindset β€” balancing technical quality with business context</li> <li class="whitespace-normal break-words pl-2">Comfortable mentoring engineers and leading technical discussions across teams</li> <li class="whitespace-normal break-words pl-2">Proven ability to build alignment across teams and influence technical direction without authority</li> </ul> <h3 class="text-text-100 mt-2 -mb-1 text-base font-bold">Nice to Have</h3> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2">Experience with code analysis, static analysis tools, or building features from source code structure</li> <li class="whitespace-normal break-words pl-2">Familiarity with CI/CD systems, developer tooling, or test infrastructure</li> <li class="whitespace-normal break-words pl-2">Experience with Ruby on Rails, React, GraphQL, or Go</li> <li class="whitespace-normal break-words pl-2">Background in search ranking, recommendation systems, or other domains where you're predicting relevance from sparse signals</li> <li class="whitespace-normal break-words pl-2">Experience working with test frameworks or test execution data</li> </ul> <h2 class="text-text-100 mt-3 -mb-1 text-[1.125rem] font-bold">✨ Why Join Buildkite</h2> <p class="font-claude-response-body break-words whitespace-normal leading-[1.7]">At Buildkite, we value kindness, autonomy, and collaboration. You'll be joining a remote-first company where your work directly helps some of the world's best engineering teams build and ship software faster and more safely.</p> <ul class="[li_&amp;]:mb-0 [li_&amp;]:mt-1 [li_&amp;]:gap-1 [&amp;:not(:last-child)_ul]:pb-1 [&amp;:not(:last-child)_ol]:pb-1 list-disc flex flex-col gap-1 pl-8 mb-3"> <li class="whitespace-normal break-words pl-2">Competitive compensation, including salary, equity, and benefits package</li> <li class="whitespace-normal break-words pl-2">Flexible, remote-first culture (Remote in the ANZ &amp; PST Regions)</li> <li class="whitespace-normal break-words pl-2">Meaningful technical challenges at scale</li> <li class="whitespace-normal break-words pl-2">Opportunities for professional growth, technical leadership, and cross-team influence</li> <li class="whitespace-normal break-words pl-2">A collaborative, inclusive, and innovative culture where your ideas make a real impact</li> </ul><div class="content-conclusion"><h3><strong>🌈 Equal Opportunity Employer</strong></h3> <p>At Buildkite, we value diversity and celebrate all types of skills, backgrounds, and experiences. We’re dedicated to fostering an inclusive environment and providing reasonable accommodations throughout our recruitment process.</p> <p>If you need any accommodations or support during the application or interview process, please reach out to us at accommodations@buildkite.com.</p></div>

Perks & benefits

  • Equity Compensation

731,000+ hidden jobs like this

Buildkite and thousands of companies post here first β€” often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications β€” free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.