Back to all jobs
T
Research Engineer Intern, Evaluations
Tensorstax Com
San FranciscoOn-site1y ago
- Employment
- Full-time
- Seniority
- Junior
About the role
- Develop evaluation environments to test AI agents' ability to reason, plan, and act autonomously within mission-critical data pipelines.
- Design benchmarks to assess model capabilities in failure detection, pipeline optimization, and agentic decision-making in data workflows.
- Implement automated assessment frameworks for language model-based agents operating over data lakes and warehouses.
- Work with synthetic and real-world datasets to create robust testing environments for AI-driven data automation.
- Collaborate with research engineers to refine reward shaping strategies, guiding models toward more efficient and agentic behaviors in data-intensive tasks.
- Experience in language model research, with a focus on benchmarking LLMs in mission-critical domains.
- Strong background in AI evaluation methodologies, reinforcement learning, and RLHF techniques.
- Familiarity with benchmarking language models for structured and unstructured data tasks.
- Proficiency in Python and experience with ML frameworks like PyTorch or JAX.
- Hands-on experience with data lakes, warehouses, and data engineering tools (Snowflake, BigQuery, dbt, Spark, Kafka).
- High agency—proactive, resourceful, and comfortable working in a fast-paced research environment with minimal supervision.
- Attention to detail—ability to design rigorous, reproducible experiments and evaluations.
- Contributions to open-source AI benchmarks (e.g., SweBench, BIRD, SPIDER).
- Contributions to open-source agentic frameworks.
- Experience developing custom RL environments for AI evaluation.
- Strong understanding of ETL, ELT, and data transformation pipelines.
- Competitive internship stipend.
- 100% employer-covered health, dental, and vision insurance (for eligible interns).
- Access to Bay Club or Equinox in San Francisco.
- Opportunity to work at the cutting edge of AI evaluations and autonomous data engineering research.
Perks & benefits
- Vision Insurance
764,000+ hidden jobs like this
Tensorstax Com and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites