Back to all jobs
H

Data Scientist

https://dbs.wd3.myworkdayjobs.com/dbs_careers

Guangzhou / Guangzhou (DTC)Hybrid4w ago
Employment
Full-time

About the role

Role Overview
We are seeking a versatile Data Scientist to lead the development of high-quality, audio-driven digital avatars. This role combines cutting-edge Generative AI with foundational Machine Learning to create responsive, identity-consistent virtual humans. You will bridge the gap between "brain" and "body" by integrating RAG-based agents with multimodal synthesis models (ViT/VLM) to build avatars that don't just look real—they interact intelligently.

Core Responsibilities

  • Multimodal Synthesis: Develop SOTA audio-to-video pipelines using Vision Transformers (ViT) and VLMs to drive lip-sync, micro-expressions, and head poses.

  • Intelligent Interaction: Architect RAG (Retrieval-Augmented Generation) systems using LangChain and AI Agents to provide avatars with a searchable knowledge base and autonomous reasoning capabilities.

  • Customized Avatar Generation: Build person-specific fine-tuning workflows (LoRA, Adapters) to ensure 1:1 identity preservation from minimal reference footage.

  • Hybrid Modeling: Apply a mix of Deep Learning (CNNs for texture, RNN/LSTM for temporal audio sequences) and Classical ML (XGBoost/Random Forest for metadata classification or signal gating).

  • End-to-End Optimization: Own the pipeline from raw audio/text input to real-time rendered output, ensuring low-latency performance on GPU clusters.

Required Technical Stack

  • Generative AI & Agents:

    • Frameworks: Mastery of LangChain or LlamaIndex for building RAG pipelines.

    • Agents: Experience deploying autonomous agents to handle multi-step reasoning tasks.

  • Computer Vision & Multimodal:

    • Architectures: Deep expertise in ViT (feature encoding) and VLM (CLIP/BLIP for alignment).

    • Deep Learning: Hands-on experience with CNNs (spatial features), RNNs/LSTMs (temporal audio-visual sync), and GANs/Diffusion.

  • Core Machine Learning:

    • Algorithms: Proficiency in Random Forest, XGBoost, and SVMs for auxiliary data tasks (e.g., emotion classification or quality gating).

    • Frameworks: PyTorch (primary), TensorFlow, and Scikit-learn.

  • Data & Infrastructure:

    • Vector DBs: Experience with Pinecone, Milvus, or Weaviate for RAG storage.

    • Tools: FFmpeg for video processing and NVIDIA DeepStream for deployment.

Qualifications

  • Experience: 5+ years in Data Science with a focus on Multimodal ML or Digital Humans.

  • Education: Master’s or PhD in CS, AI, or a related quantitative field.

  • Problem Solving: Proven ability to solve the "uncanny valley" through superior temporal consistency and identity-aware fine-tuning.

Location:

Guangzhou (DTC)

Job:

Analytics

Schedule:

Regular

Employee Status:

Full time

731,000+ hidden jobs like this

https://dbs.wd3.myworkdayjobs.com/dbs_careers and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.