Back to all jobs
Featherless AI logo

AI Researcher – Multilingual Data

Featherless AI
WorldwideRemote
Employment
Full-time

About the role

About the Role

We’re looking for an AI Researcher focused on multilingual data to help us build and scale next-generation language models across diverse languages and domains. You’ll own research and execution around data sourcing, curation, evaluation, and training strategies for multilingual and low-resource languages, with a strong emphasis on publishing high-quality research and translating it into production systems.

This role is ideal for someone who enjoys working close to the frontier: balancing papers, prototypes, and real-world impact in a fast-moving startup environment.

What You’ll Do

  • Design and execute research on multilingual datasets, including data collection, filtering, deduplication, and quality measurement

  • Develop strategies for low-resource and long-tail languages (sampling, augmentation, curriculum design)

  • Research and improve cross-lingual transfer, alignment, and robustness in large language models

  • Build and maintain evaluation benchmarks for multilingual performance

  • Collaborate with engineers and researchers on training pipelines and model architecture decisions

  • Publish research at top venues (e.g., ACL, EMNLP, NeurIPS, ICML, ICLR) and contribute to open-source when appropriate

  • Translate research insights into practical improvements in production models

What We’re Looking For

  • Strong background in NLP / ML research, with a focus on multilingual or cross-lingual modeling

  • Publication record at respected conferences or journals (ACL, EMNLP, NeurIPS, ICML, ICLR, etc.)

  • Experience working with large-scale text datasets across multiple languages

  • Solid understanding of:

    • Tokenization and vocabulary design for multilingual models

    • Data quality metrics, filtering, and dataset bias

    • Transfer learning and multilingual representation learning

  • Comfortable prototyping in Python with modern ML frameworks (PyTorch, JAX, etc.)

  • Ability to operate independently and ship research in a startup pace environment

Nice to Have

  • Experience with low-resource languages or non-Latin scripts

  • Open-source contributions in NLP or data tooling

  • Experience training or evaluating large language models

  • Familiarity with multilingual benchmarks (e.g., XTREME, FLORES, TyDi QA)

Why Join Us

  • Real ownership over research direction and impact

  • A team that values papers and production

  • Access to meaningful scale: large datasets, modern infrastructure, and fast iteration

  • Competitive compensation and meaningful equity at an early stage

Perks & benefits

  • Equity Compensation

741,000+ hidden jobs like this

Featherless AI and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.