Back to all jobs
I
Member of Technical Staff, Data
Inception
Bay Area$200k–350kOn-site3mo ago
- Employment
- Full-time
- Seniority
- Staff
About the role
- Develop data mixes for training LLMs, including by leveraging open-source datasets, synthetically generated data, and curated human feedback.
- Design and implement data pipelines for processing petabyte-scale datasets.
- Build systems for web crawling, data ingestion, and real-time data processing to support model training.
- Develop tools and frameworks for efficient data storage, retrieval, and versioning across distributed systems.
- Create evaluation frameworks to measure data diversity, quality, and representativeness.
- Ensure data collection adheres to privacy regulations.
- BS/MS/PhD in Computer Science, Machine Learning, or a related field (or equivalent experience).
- 3+ years of experience building data processing pipelines at scale, particularly with AI/ML applications.
- Strong proficiency in Python and experience with data processing frameworks (Apache Spark, Beam, Airflow).
- Familiarity with synthetic data generation techniques and data augmentation strategies.
- Familiarity with web scraping, crawling technologies, and Common Crawl datasets.
- Solid understanding of machine learning fundamentals and experience with ML frameworks (PyTorch, TensorFlow).
- Experience with SQL and NoSQL databases for managing structured and unstructured data.
- Experience with large language models and understanding of tokenization, embeddings, and model architectures.
- Experience managing human annotation workflows and quality control processes.
- Experience with vector databases and embedding-based retrieval systems.
- Knowledge of data privacy regulations and ethical AI practices.
- Experience with distributed computing and large-scale data storage systems (HDFS, S3, BigQuery).
Compensation
- Work with World-Class Talent: Collaborate with the inventors of diffusion models and leading AI researchers
- Shape Foundational Technology: Your decisions will influence how the next generation of AI products are built and used
- Immediate Impact: Join at the ground floor where your contributions directly shape product direction and company trajectory
- Competitive salary and equity in a rapidly growing startup
- Flexible vacation and paid time off (PTO)
- Health, dental, and vision insurance
- 401k match
- Catered meals (breakfast, lunch, & dinner)
- Commuter subsidies
- A collaborative and inclusive culture
Perks & benefits
- 401k
- Vision Insurance
- Unlimited Vacation
- Paid Time Off
- Pension Matching
- Equity Compensation
764,000+ hidden jobs like this
Inception and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites