Back to all jobs
H

Systems Research Engineer

Huawei R&D UK

Edinburgh5mo ago

About the role

 

Job Vision

In an era where LLM are rebuilding the foundational software stack, Huawei’s CloudMatrix super-node clusters and AI-native infrastructure are reshaping how large-scale models are trained, served, and deployed. The Edinburgh Research Centre plays a key role in this transformation, driving new AI Infra & Agentic Serving architectures and helping define Huawei’s next-generation large-scale data centre and AI infrastructure systems. Positioned at the intersection of advanced systems research and industrial-scale engineering, our team turns innovative system designs into deployable, real-world technologies.

We are seeking Systems Research Engineers with a strong interest in computer systems, distributed AI infrastructure, and performance optimization. These roles are ideal for recent PhD graduates or exceptional BSc/MSc engineers looking to build research-driven engineering experience in areas such as operating systems, distributed systems, AI model serving, and machine learning infrastructure. You will work closely with senior architects on real-world projects, helping to prototype and optimize next-generation AI infrastructure.

Key Responsibilities

·       Distributed Systems Research & Development:
Architect, implement, and evaluate distributed system components for emerging AI and data-centric workloads. Drive modular design and scalability across CPU, GPU, and NPU clusters, building highly efficient serving and scheduling systems.

·       Performance Optimization & Profiling:
Conduct in-depth profiling and performance tuning of large-scale inference and data pipelines, focusing on KV cache management, heterogeneous memory scheduling, and high-throughput inference serving using frameworks like vLLM, Ray Serve, and modern PyTorch Distributed systems.

·       Scalable Model Serving Infrastructure:
Develop and evaluate frameworks that enable efficient multi-tenant, low-latency, and fault-tolerant AI serving across distributed environments. Research and prototype new techniques for cache sharing, data locality, and resource orchestration and scheduling within AI clusters.

·       Research & Publications:
Translate innovative research ideas into publishable contributions at leading venues (e.g., OSDI, NSDI, EuroSys, SoCC, MLSys, NeurIPS, ICML, ICLR) while driving internal adoption of novel methods and architectures.

·       Cross-Team Collaboration:
Communicate technical insights, research progress, and evaluation outcomes effectively to multidisciplinary stakeholders and global Huawei research teams.

 

Person Specification

Required Qualifications and Skills:

·       Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field.

·       Strong knowledge of distributed systems, operating systems, machine learning systems architecture, Inference serving, and AI Infrastructure.

·       Hands-on experience with LLM serving frameworks (e.g., vLLM, Ray Serve, TensorRT-LLM, TGI) and distributed KV cache optimization.

·       Proficiency in C/C++, with additional experience in Python for research prototyping.

·       Solid grounding in systems research methodology, distributed algorithms, and profiling tools.

·       Team-oriented mindset with effective technical communication skills.

Desired Qualifications and Experience:

·       PhD in systems, distributed computing, or large-scale AI infrastructure.

·       Publications in top-tier systems or ML conferences (NSDI, OSDI, EuroSys, SoCC, MLSys, NeurIPS, ICML, ICLR).

·       Understanding of load balancing, state management, fault tolerance, and resource scheduling in large-scale AI inference clusters.

·       Prior experience designing, deploying, and profiling high-performance cloud or AI infrastructure systems.

479,000+ hidden jobs like this

Huawei R&D UK and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.