Back to all jobs
L

Research Engineer - Evaluations

Lumalabs Ai

London$190k–375kHybrid9mo ago
Employment
Full-time

About the role

  • Design and implement scalable pipelines for automated evaluation of generative models, with a focus on visual and multimodal outputs (image, video, text, audio).
  • Develop novel metrics and evaluation models that capture qualities like fidelity, coherence, temporal consistency, and alignment with human intent.
  • Integrate evaluation signals into training loops (including reinforcement learning and reward modeling) to continuously improve model performance.
  • Build infrastructure for large-scale regression testing, benchmarking, and monitoring of multimodal generative models.
  • Collaborate with researchers running human studies to translate human evaluation frameworks into automated or semi-automated systems.
  • Partner with model researchers to identify failure cases and build targeted evaluation harnesses.
  • Maintain dashboards, reporting tools, and alerting systems to surface evaluation results to stakeholders.
  • Stay current with emerging evaluation techniques in generative AI, multimodal LLMs, and perceptual quality assessment.
  • Master's or PhD in Computer Science, Machine Learning, or a related technical field (or equivalent industry experience).
  • 5+ years of experience building ML evaluation systems, model pipelines, or large-scale infrastructure.
  • Hands-on experience working with visual data (images and/or video), including evaluation, modeling, or data preparation.
  • Proficiency in Python and ML frameworks (PyTorch, JAX, or TensorFlow).
  • Familiarity with human-in-the-loop evaluation workflows and how to scale them with automation.
  • Strong background in machine learning, with experience in generative models (diffusion, LLMs, multimodal architectures).
  • Strong software engineering skills (CI/CD, testing, data pipelines, distributed systems).
  • Experience with reinforcement learning or reward modeling.
  • Prior work on perceptual metrics, multimodal evaluation benchmarks, or retrieval-based evaluation.
  • Background in large-scale model training or evaluation infrastructure.
  • Experience designing metrics for perceptual quality
  • Familiarity with creative media workflows (film, VFX, animation, digital art).
  • Contributions to open-source evaluation libraries or benchmarks.

Compensation

About Luma

764,000+ hidden jobs like this

Lumalabs Ai and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.