Back to all jobs

- Seniority
- Staff
About the role
<p><strong><span data-contrast="none"><span data-ccp-parastyle="heading 2">About us</span></span></strong><span data-ccp-props="{"134245418":true,"134245529":true,"335559738":200,"335559739":0}"> </span></p>
<p><span data-contrast="auto">Graphcore is one of the world’s leading innovators in Artificial Intelligence compute. It is developing hardware, software and systems infrastructure that will unlock the next generation of AI breakthroughs and power the widespread adoption of AI solutions across every industry.</span><span data-ccp-props="{}"> </span></p>
<p><span data-contrast="auto">As part of the SoftBank Group, Graphcore is a member of an elite family of companies responsible for some of the world’s most transformative technologies. Together, they share a bold vision: to enable Artificial Super Intelligence and ensure its benefits are accessible to everyone.</span><span data-ccp-props="{}"> </span></p>
<p><span data-contrast="auto">Graphcore’s teams are drawn from diverse backgrounds and bring a broad range of skills and perspectives. A melting pot of AI research specialists, silicon designers, software engineers and systems architects, Graphcore enjoys a culture of continuous learning and constant innovation.</span><span data-ccp-props="{}"> </span></p>
<p><strong><span data-contrast="none"><span data-ccp-parastyle="heading 2">Job Summary</span></span></strong><span data-ccp-props="{"134245418":true,"134245529":true,"335559738":200,"335559739":0}"> </span></p>
<p><span data-contrast="auto">We are seeking an experienced Principal Hardware Diagnostics Engineer to design and develop diagnostics software used to monitor hardware health and diagnose system-level issues across Graphcore’s AI infrastructure platforms.</span><span data-ccp-props="{}"> </span></p>
<p><span data-contrast="auto">This role focuses on building diagnostics agents, tools, and analytics frameworks that enable engineers and automation systems to identify, isolate, and resolve hardware issues across blade-level servers and rack-scale clusters.</span><span data-ccp-props="{}"> </span></p>
<p><strong><span data-contrast="none"><span data-ccp-parastyle="heading 2">The Team</span></span></strong><span data-ccp-props="{"134245418":true,"134245529":true,"335559738":200,"335559739":0}"> </span></p>
<p><span data-contrast="auto"><span data-ccp-parastyle="heading 2">Graphcore</span><span data-ccp-parastyle="heading 2"> is a globally </span><span data-ccp-parastyle="heading 2">recognised</span><span data-ccp-parastyle="heading 2"> leader in Artificial Intelligence computing systems. The company designs advanced semiconductors and data </span><span data-ccp-parastyle="heading 2">centre</span><span data-ccp-parastyle="heading 2"> hardware that provide the </span><span data-ccp-parastyle="heading 2">specialised</span><span data-ccp-parastyle="heading 2"> processing power needed to drive AI innovation, while delivering the efficiency required to support its broader adoption. </span></span><span data-ccp-props="{"134245418":true,"134245529":true,"335559738":200,"335559739":0}"> </span></p>
<p><span data-ccp-props="{}"> </span></p>
<p><span data-contrast="auto">The Systems Engineering and Platform Validation team ensures Graphcore’s AI compute platforms are reliable, diagnosable, and operationally robust at scale.</span><span data-ccp-props="{}"> </span></p>
<p><span data-contrast="auto">The team collaborates with hardware engineering, firmware, cloud infrastructure, and automation teams to develop tools and frameworks that monitor system health, detect hardware failures, and accelerate root-cause analysis across AI clusters.</span><span data-ccp-props="{}"> </span></p>
<p><strong><span data-contrast="none"><span data-ccp-parastyle="heading 2">Responsibilities and Duties</span></span></strong><span data-ccp-props="{"134245418":true,"134245529":true,"335559738":200,"335559739":0}"> </span></p>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="1" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Design and develop automated hardware diagnostics solutions for blade-level servers and rack-scale AI systems.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="2" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Architect and implement diagnostic agents, monitoring tools, and analytics frameworks to track hardware telemetry.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="3" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Collaborate with hardware teams to integrate low-level diagnostic modules into monitoring systems.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="4" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Develop diagnostics tools capable of detecting hardware health conditions and isolating failures.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="5" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Create diagnostic modules used for internal validation and </span><span data-ccp-parastyle="List Bullet">production</span><span data-ccp-parastyle="List Bullet"> data center operations.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="6" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Provide detailed hardware fault information to system engineers to accelerate troubleshooting.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="7" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Define remediation workflows and insights for hardware fault scenarios across nodes and clusters.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="8" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Collaborate with firmware, networking, and cloud platform teams to integrate diagnostics across the system stack.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<p><strong><span data-contrast="none"><span data-ccp-parastyle="heading 2">Candidate Profile</span></span></strong><span data-ccp-props="{"134245418":true,"134245529":true,"335559738":200,"335559739":0}"> </span></p>
<p><strong><span data-contrast="auto">Essential</span></strong><span data-ccp-props="{}"> </span></p>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="9" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, or related discipline.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="10" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Strong software engineering experience in Python, C++, or C#.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="11" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Experience developing diagnostics or </span><span data-ccp-parastyle="List Bullet">monitoring</span><span data-ccp-parastyle="List Bullet"> systems for hardware platforms.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="12" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Experience working with distributed systems or cloud infrastructure.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="13" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Strong knowledge of Linux environments and system-level diagnostics tools.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="14" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Experience collaborating with CM/ODM partners on manufacturing diagnostics and fault isolation.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="15" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Strong analytical and debugging skills.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="16" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Excellent communication and collaboration abilities.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<p><strong><span data-contrast="auto">Desirable</span></strong><span data-ccp-props="{}"> </span></p>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="17" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Experience working with AI hardware platforms or accelerator-based computing systems.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="18" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Familiarity with hyperscale data center infrastructure.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="19" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Experience building cluster-level monitoring or </span><span data-ccp-parastyle="List Bullet">diagnostics</span><span data-ccp-parastyle="List Bullet"> systems.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<ul>
<li data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{"335552541":1,"335559685":360,"335559991":360,"469769226":"Symbol","469769242":[8226],"469777803":"left","469777804":"","469777815":"singleLevel"}" data-aria-posinset="20" data-aria-level="1"><span data-contrast="auto"><span data-ccp-parastyle="List Bullet">Experience interacting with internal or external customers during </span><span data-ccp-parastyle="List Bullet">diagnostics</span><span data-ccp-parastyle="List Bullet"> solution development.</span></span><span data-ccp-props="{}"> </span></li>
</ul>
<p>In addition to a competitive salary, Graphcore offers flexible working and a comprehensive benefits package designed to support your health, wellbeing and financial future. Our benefits include medical, dental and vision coverage, Flexible Spending Accounts (FSAs), Health Savings Accounts (HSAs), disability and life insurance, a 401(k) retirement plan, commuter benefits, wellness services and an Employee Assistance Programme (EAP). We welcome people of different backgrounds and experiences; we're committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.</p>
Perks & benefits
- 401k
- Vision Insurance
731,000+ hidden jobs like this
Graphcore and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites