Back to all jobs
R
AI Quality Engineer
Rootly
OfficeHybrid4d ago
- Employment
- Full-time
About the role
About Rootly
- Design and execute prompt-based test scenarios that cover happy paths, edge cases, and adversarial inputs across Rootly's agentic AI features
- Evaluate AI outputs for accuracy, relevance, consistency, and alignment with expected workflow behaviour
- Build and maintain an evaluation framework; structured test libraries, scoring rubrics, and regression suites to track AI performance over time
- Identify failure modes, hallucinations, reasoning gaps, and unexpected agent behaviours; document findings and work with engineers to resolve them
- Partner with Product and Engineering on new AI feature releases, contributing to acceptance criteria and quality gates before launch
- Define and track quality metrics (accuracy rates, failure frequency, regression trends) and report findings to stakeholders
- Stay current on LLM evaluation techniques, prompt engineering best practices, and agentic testing methodologies
- +5 years in QA, product operations, AI/ML evaluation, or a closely related role
- Hands-on experience testing or evaluating LLM-powered or agentic AI products
- Strong prompt engineering instincts -- you understand how wording, context, and structure affect model behaviour
- Comfortable writing scripts or working with evaluation tools (Python a plus; not required to be a full-stack engineer)
- Sharp analytical thinking; you can spot a subtle reasoning failure and articulate exactly why it's a problem
- Clear written communicator; able to translate AI behaviour findings for both technical and non-technical audiences
- Familiarity with incident management, DevOps, or IT operations workflows is a strong asset
- Experience with evaluation frameworks (e.g. LangSmith, PromptFlow, Braintrust, or similar)
- Exposure to red-teaming or adversarial testing of AI systems
- Comfortable writing E2E tests with Playwright
- Background working at a B2B SaaS or developer-tools company
- Familiar with mobile app testing (iOS/Android)
Why Rootly?
- Competitive compensation and early equity in a fast-growing, venture-backed company.
- Comprehensive medical, dental, and vision coverage.
- 3 weeks of vacation, plus unlimited sick and mental health days, and a company-wide end-of-year shutdown to recharge.
- $500 stipend for home office setup.
- Unlimited token usage and access to AI tools
- A fast-moving, high-impact environment where your leadership and ideas directly shape the future of the company.
Perks & benefits
- Vision Insurance
- Home Office Budget
- Equity Compensation
764,000+ hidden jobs like this
Rootly and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites