Research Engineer/Scientist - Machine Learning RL & Optimisation (Contractor)
Huawei R&D UK
About the role
About Huawei Research and Development UK Limited
Founded in 1987, Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices. We have 207,000 employees and operate in over 170 countries and regions, serving more than three billion people around the world.
Our vision and mission is to bring digital to every person, home and organization for a fully connected, intelligent world. To this end, we will drive ubiquitous connectivity and promote equal access to networks; bring cloud and artificial intelligence to all four corners of the earth to provide superior computing power where you need it, when you need it; build digital platforms to help all industries and organizations become more agile, efficient, and dynamic; redefine user experience with AI, making it more personalized for people in all aspects of their life, whether they’re at home, in the office, or on the go.
This spirit of innovation has led Huawei to work in close partnership with leading academic institutions in the UK to develop and refine the latest technologies. With a shared commitment to innovation and progress, both parties have worked together to achieve common goals and establish a strong partnership. The partnership between UK and Huawei help to develop the technologies of the future that will transform the way we all communicate, work and live.
For the past 30 years we have maintained an unwavering focus, rejecting shortcuts and easy opportunities that don't align with our core business. With a practical approach to everything we do, we concentrate our efforts and invest patiently to drive technological breakthroughs.
This strategic focus is a reflection of our core values:
Staying customer-centric,
Inspiring dedication,
Persevering,
Growing by reflection.
Huawei Research and Development UK Limited Overview
Huawei’s vision is a fully connected, intelligent world. To achieve this, we work to inspire passion for basic research around the world. Our combined passion drives development across the global innovation value chain. Huawei has the largest Research and Development organization in the world with 96,000+ employees in research centers around the globe. In the UK, we already have design centers in Cambridge, London, Edinburgh and Ipswich. We continue to explore and define new research directions and new services. We have expanded our collaborations with academic researchers; researched new network architectures, integration of communications and key enabling technologies; and developed the fundamental theories of these technologies. We invite you to join us on this exciting journey and drive your career forward.
Job Summary
Research and develop large-scale machine learning systems, alignment workflows, and optimization infrastructure to advance LLM reasoning and post-training capabilities. Design and execute scaled reinforcement learning pipelines (e.g., PPO, GRPO) utilizing distributed training frameworks (verl, trl, DeepSpeed, FSDP) integrated with high-performance inference engines (vLLM). Optimize low-level training throughput, kernel performance, and memory utilization across heterogeneous hardware clusters using expressive hardware DSLs (e.g., TileLang, Triton). Advance the LLM orchestration loop and leverage Bayesian optimization to automate the search, generation, and continuous improvement of high-performance NPU kernels.
Key Responsibilities:
Design and execute scaled RL finetuning workflows (e.g., PPO, GRPO) to enhance LLM reasoning, instruction-following, and alignment.
Architect and manage large-scale distributed training experiments across multi-node GPU, optimizing for maximum throughput and hardware utilization.
Develop and maintain training infrastructure using advanced parallelization frameworks (verl, trl, DeepSpeed, FSDP) to support rapidly evolving research needs.
Integrate high-performance inference engines like vLLM directly into RL generation loops to reduce rollout latency and accelerate training cycles.
Implement robust profiling and debugging pipelines to diagnose bottlenecks in GPU memory, compute, and inter-node communication.
Collaborate with data and evaluation teams to design dense reward functions and synthetic data generation pipelines.
Design, benchmark, and deploy highly optimized custom tensor operators (e.g., FlashAttention, GEMM) across heterogeneous hardware architectures using modern Domain-Specific Languages (DSLs) and AI compilers.
This job description is only an outline of the tasks, responsibilities and outcomes required of the role. The jobholder will carry out any other duties as may be reasonably required by his/her line manager. The job description and personal specification may be reviewed on an ongoing basis in accordance with the changing needs of Huawei Research and Development UK Limited.
Person Specification:
Required:
Master's or PhD (or equivalent industry research experience) in Machine Learning, Computer Science, Data Science, or a highly quantitative field with a heavy focus on Machine Learning.
Deep proficiency in PyTorch and experience writing custom training loops and data pipelines.
Hands-on experience with RLHF/RLVF methods (e.g., PPO, GRPO) and an understanding of policy optimization dynamics.
Technical familiarity with at least two of the following: DeepSpeed, FSDP, verl or trl.
Production or heavy research experience utilizing vLLM or similar high-throughput inference serving engines for generation.
Ability to thrive in a fast-paced, iterative environment where research and production infrastructure deeply intersect.
Desired:
Proven track record of running scaled GPU experiments across multi-node clusters.
Experience implementing next-gen alignment and reasoning paradigms, such as GRPO or Monte Carlo Tree Search (MCTS).
Deep understanding of GPU architectures, kernels, FlashAttention, and profiling tools.
Familiarity with cluster environments and schedulers like Slurm or Kubernetes.
Hands-on experience developing within NPU stack or ecosystem integrations.
Practical knowledge of hardware-expressive Domain-Specific Languages (e.g., TileLang, Triton) to optimize low-level memory placement, layout propagation, and thread-block scheduling.
Research publications at top-tier AI/ML conferences (NeurIPS, ICLR, ICML) or a strong open-source GitHub footprint in LLM training/infrastructure.
What we offer
33 days annual leave entitlement per year (including UK public holidays)
Group Personal Pension
Life insurance
Private medical insurance
Medical expense claim scheme
Employee Assistance Program
Cycle to work scheme
Company sports club and social events
Additional time off for learning and development
Perks & benefits
- Medical Insurance
479,000+ hidden jobs like this
Huawei R&D UK and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites