Research Engineer - Evaluations

Lumalabs Ai

London$190k–375kHybrid9mo ago

Apply

Employment: Full-time

About the role

Design and implement scalable pipelines for automated evaluation of generative models, with a focus on visual and multimodal outputs (image, video, text, audio).
Develop novel metrics and evaluation models that capture qualities like fidelity, coherence, temporal consistency, and alignment with human intent.
Integrate evaluation signals into training loops (including reinforcement learning and reward modeling) to continuously improve model performance.
Build infrastructure for large-scale regression testing, benchmarking, and monitoring of multimodal generative models.
Collaborate with researchers running human studies to translate human evaluation frameworks into automated or semi-automated systems.
Partner with model researchers to identify failure cases and build targeted evaluation harnesses.
Maintain dashboards, reporting tools, and alerting systems to surface evaluation results to stakeholders.
Stay current with emerging evaluation techniques in generative AI, multimodal LLMs, and perceptual quality assessment.

Master's or PhD in Computer Science, Machine Learning, or a related technical field (or equivalent industry experience).
5+ years of experience building ML evaluation systems, model pipelines, or large-scale infrastructure.
Hands-on experience working with visual data (images and/or video), including evaluation, modeling, or data preparation.
Proficiency in Python and ML frameworks (PyTorch, JAX, or TensorFlow).
Familiarity with human-in-the-loop evaluation workflows and how to scale them with automation.
Strong background in machine learning, with experience in generative models (diffusion, LLMs, multimodal architectures).
Strong software engineering skills (CI/CD, testing, data pipelines, distributed systems).

Experience with reinforcement learning or reward modeling.
Prior work on perceptual metrics, multimodal evaluation benchmarks, or retrieval-based evaluation.
Background in large-scale model training or evaluation infrastructure.
Experience designing metrics for perceptual quality
Familiarity with creative media workflows (film, VFX, animation, digital art).
Contributions to open-source evaluation libraries or benchmarks.

Compensation

About Luma

764,000+ hidden jobs like this

Lumalabs Ai and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime