Senior Staff Research Engineer – Reinforcement Learning for AI Agents

XPENG

Santa Clara2w ago

Seniority: Staff

About the role

<div data-page-id="JtHmdFPBFoDeeNxg3hrc4Xyjnhg" data-docx-has-block-data="false"> <div class="ace-line ace-line old-record-id-X1AxdI3wHoDJuhxEQzscIQfNnxb"> <div data-page-id="IlWVdPHa4oaUIgxAp1KccvMknZZ" data-docx-has-block-data="false"> <div class="ace-line ace-line old-record-id-TMEXdNB2hoez9lx0yNEcRvhgnht"> <div data-page-id="Pd7XdC4E7oI0w5xg9NlcS9nMn4E" data-lark-html-role="root" data-docx-has-block-data="false"> <div class="ace-line ace-line old-record-id-IlaFdwrzCokPoyxCHqDcqhYWnTe"> <div data-page-id="N2BkderONoTEZGxHa1bcGuqdn7N" data-lark-html-role="root" data-docx-has-block-data="false"> <div class="ace-line ace-line old-record-id-FwFLdHX5YoFzunxw1vGcWjt0n7d"><strong>XPENG</strong> is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics. With a strong focus on intelligent mobility, XPENG is dedicated to reshaping the future of transportation through cutting-edge R&D in AI, machine learning, and smart connectivity.</div> <div class="ace-line ace-line old-record-id-C1epd9DmDoererxpm3Yc1cQRnXF"> </div> <div class="ace-line ace-line old-record-id-C1epd9DmDoererxpm3Yc1cQRnXF"> <div class="ace-line ace-line old-record-id-MHzWdYxFNoJE3cx0QoJcVpDJnEb">We are looking for exceptional <strong>Research Engineers / Scientists</strong> to design learning systems that allow agents to plan over long horizons, learn effective strategies, and improve through experience.</div> <div class="ace-line ace-line old-record-id-EHGNdgjBHodMzhxQluacsnnVnte">This role sits at the intersection of <strong>reinforcement learning</strong><strong>, large language models, and real-world autonomous systems</strong>. Autonomous systems must operate reliably in complex, dynamic environments. We believe the next generation of autonomy will involve <strong>learning agents that continuously improve through interaction, feedback, and large-scale data</strong>. You will help build the <strong>learning systems that power these agents</strong>.</div> </div> <div class="ace-line ace-line old-record-id-WROJdUEKyoTzWrxxrwKc6M8on2b"> </div> <div class="ace-line ace-line old-record-id-Ty5QdShYlombw1xCF6Ecliepnmh"> <div data-page-id="M1bAdYQvroViM3xbQUOcVwYXn8d" data-lark-html-role="root" data-docx-has-block-data="false"> <h4 class="heading-2 ace-line old-record-id-FWjod0ITeo071Vxe2fLc50GCnHc"><strong>Key Responsibilities:</strong></h4> <ul class="list-bullet1"> <li class="ace-line ace-line old-record-id-HV5ldhiI9orW4Yx7wA6cnNf7nle" data-list="bullet"> <div>Reinforcement learning methods for <strong>LLM-driven agents and decision systems.</strong></div> </li> <li class="ace-line ace-line old-record-id-F0vkdCkk0o1MP5xA0dBcZUEjn5d" data-list="bullet"> <div>Policy optimization for <strong>long-horizon reasoning and planning.</strong></div> </li> <li class="ace-line ace-line old-record-id-DrkadusZto3Or6xOIzzcCUHMn4f" data-list="bullet"> <div>Learning from <strong>human or </strong><strong>AI</strong><strong> feedback (RLHF / RLAIF).</strong></div> </li> <li class="ace-line ace-line old-record-id-VbdLdKE85oxfNqxgTyfcjHrinpf" data-list="bullet"> <div>Agent training pipelines built on top of our <strong>agent infrastructure platform.</strong></div> </li> <li class="ace-line ace-line old-record-id-R9aedTjuKoEKRWxlkAycpbkqnDd" data-list="bullet"> <div>Evaluation and benchmarking systems for agent capabilities.</div> </li> <li class="ace-line ace-line old-record-id-CP7id8S2KoK1n6xyP3ace5G3nAh" data-list="bullet"> <div>Learning loops that integrate <strong>real-world and simulation data.</strong></div> </li> <li class="ace-line ace-line old-record-id-CP7id8S2KoK1n6xyP3ace5G3nAh" data-list="bullet"> <div>Contribute to AI systems that <strong>continuously improve after deployment</strong>.</div> </li> </ul> </div> </div> <div class="ace-line ace-line old-record-id-U9eRdgoV4oUYFyxVW4BcBeJrnAb"> <div data-page-id="M1bAdYQvroViM3xbQUOcVwYXn8d" data-lark-html-role="root" data-docx-has-block-data="false"> <h4 class="heading-2 ace-line old-record-id-KkHTd3KiFobqYsxgvfkcCkqXnme">Basic Qualifications</h4> <ul class="list-bullet1"> <li class="ace-line ace-line old-record-id-T0VhdMJOEoHPjexTd65c3YGJn9f" data-list="bullet"> <div>MS or PhD in Computer Science, AI, Machine Learning, Robotics, or a related field.</div> </li> <li class="ace-line ace-line old-record-id-Re3Qd3vuwonQXTxAE5OclvL6nfd" data-list="bullet"> <div>Strong background in <strong>reinforcement learning</strong><strong> or </strong><strong>machine learning.</strong></div> </li> <li class="ace-line ace-line old-record-id-ATs5diJvKoVstzxc3QWctp4anUc" data-list="bullet"> <div>Experience implementing RL algorithms such as <strong>PPO, Actor-Critic, or policy gradient methods.</strong></div> </li> <li class="ace-line ace-line old-record-id-DvNidY2wVoXXcrxODOkcBpKMnUk" data-list="bullet"> <div>Strong programming skills in <strong>Python</strong> with <strong>PyTorch</strong><strong> or JAX.</strong></div> </li> <li class="ace-line ace-line old-record-id-XQjRdz090o23ApxgLaFcebySnmc" data-list="bullet"> <div>Experience building <strong>ML</strong><strong> training systems or infrastructure.</strong></div> </li> </ul> </div> </div> <div class="ace-line ace-line old-record-id-NLiJdQ3C8oqZbDxjJuVcB7llnOg"> <div data-page-id="M1bAdYQvroViM3xbQUOcVwYXn8d" data-lark-html-role="root" data-docx-has-block-data="false"> <h4 class="heading-2 ace-line old-record-id-X8kHdfWpdoyTrsxvCCpcYGonnYg">Preferred Qualifications</h4> <ul class="list-bullet1"> <li class="ace-line ace-line old-record-id-B9OjdUmdxooh3pxfwErcLjz9nNf" data-list="bullet"> <div>Experience with <strong>RLHF or preference learning.</strong></div> </li> <li class="ace-line ace-line old-record-id-IEeLdLfgjoZ3A4xRd3DcvSVPnzg" data-list="bullet"> <div>Experience with <strong>LLM</strong><strong> agents or tool-using </strong><strong>AI</strong><strong> systems.</strong></div> </li> <li class="ace-line ace-line old-record-id-N76md0lvBoXjQQxel8ecp0YunVg" data-list="bullet"> <div>Multi-agent systems or long-horizon planning.</div> </li> <li class="ace-line ace-line old-record-id-KwatdU5lKo1aXwxe0INcaVlPnhg" data-list="bullet"> <div>Simulation environments for RL.</div> </li> <li class="ace-line ace-line old-record-id-Y26DdqcxtoX2ntxbytQcMXSfn5c" data-list="bullet"> <div>Publications in <strong>NeurIPS, ICML, ICLR, </strong><strong>ACL</strong>, or related venues.</div> </li> </ul> </div> </div> <div class="ace-line ace-line old-record-id-NLiJdQ3C8oqZbDxjJuVcB7llnOg"> </div> <div class="ace-line ace-line old-record-id-NLiJdQ3C8oqZbDxjJuVcB7llnOg"><strong>What do we provide:</strong></div> <ul class="list-bullet1"> <li class="ace-line ace-line old-record-id-Wz0HdT0Mao1VdExfCaUca1F0n1b" data-list="bullet"> <div>A fun, supportive and engaging environment.</div> </li> <li class="ace-line ace-line old-record-id-ZyPrdKGzcopKhexjk7xctpGNnnf" data-list="bullet"> <div>Opportunity to make significant impact on transportation revolution by the means of advancing autonomous driving.</div> </li> <li class="ace-line ace-line old-record-id-IKeQdg5R1oUKa9xbzapciF98nFg" data-list="bullet"> <div>Opportunity to work on cutting edge technologies with the top talent in the field.</div> </li> <li class="ace-line ace-line old-record-id-UB8Rd1WfQo2qMuxbmbUcm67lnCe" data-list="bullet"> <div>Competitive compensation package.</div> </li> <li class="ace-line ace-line old-record-id-Uz0JdPNktoOK3AxaacKcd1yQnDP" data-list="bullet"> <div>Snacks, lunches and fun activities.</div> </li> </ul> <div class="ace-line ace-line old-record-id-GwIWdXAjpol3VMxbAasc4rqpnGg"> </div> <div class="ace-line ace-line old-record-id-DO8Nd6rquoDjkqxzRHFcTYPmnpb">The base salary range for this full-time position is $244,140 - $413,160, in addition to bonus, equity and benefits. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training.</div> <div class="ace-line ace-line old-record-id-SW6gdVM0XoWLiOxveyzcFcuDn4f"> </div> <div class="ace-line ace-line old-record-id-KwuJdxW61o8D4axaPSbcZkz5nXc">We are an Equal Opportunity Employer. It is our policy to provide equal employment opportunities to all qualified persons without regard to race, age, color, sex, sexual orientation, religion, national origin, disability, veteran status or marital status or any other prescribed category set forth in federal or state regulations.</div> </div> </div> </div> </div> </div> </div> </div>

Perks & benefits

Equity Compensation

753,000+ hidden jobs like this

XPENG and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime