Back to all jobs
A

Principal Scientific Data Architect

APAC
gurgaon20h ago
Seniority
Staff

About the role

<div class="content-intro"><h3 style="text-align: center;"><span style="text-decoration: underline;"><strong>About Xebia</strong></span></h3> <p style="text-align: center;"><strong>Xebia</strong> is a trusted advisor in the modern era of digital transformation, serving hundreds of leading brands worldwide with end-to-end IT solutions. The company has experts specializing in t<strong>echnology consulting, software engineering, AI, digital products and platforms, data, cloud, intelligent automation, agile transformation, and industry digitization.</strong> In addition to providing high-quality digital consulting and state-of-the-art software development, Xebia has a host of standardized solutions that substantially reduce the time-to-market for businesses.</p> <p style="text-align: center;">Xebia also offers a diverse portfolio of training courses to help support forward-thinking organizations as they look to upskill and educate their workforce to capitalize on the latest digital capabilities. The company has a strong presence across 16 countries with development centres across the<strong> US, Latin America, Western Europe, Poland, the Nordics, the Middle East, and Asia Pacific.</strong></p></div><p><strong>Job Description: Principal Scientific Data Architect (Google Cloud Platform Ecosystem) </strong></p> <p><strong>Role Overview</strong></p> <p>Highly specialized <strong>Principal Scientific Data Architect</strong> to bridge the gap between advanced Google Cloud engineering and life sciences discovery. This role will redefine how scientific data is structured, scaled, and consumed across our R&amp;D, Onyx, and CMC (Chemistry, Manufacturing, and Controls) divisions.</p> <p>Operating natively within the <strong>Google Cloud Platform (GCP)</strong> and <strong>Databricks on GCP</strong> ecosystem, will lead the transition toward a fully automated, software-defined data framework by implementing <strong>Schema as Code</strong>, <strong>Data as Code</strong>, and metadata-driven <strong>Configuration Data Engineering</strong>. The ideal candidate combines elite cloud data architecture expertise with deep scientific literacy, enabling the design of data systems that directly power <em>in-silico</em> molecular discovery and autonomous Agentic AI frameworks.</p> <p><strong>Key Responsibilities</strong></p> <ol> <li><strong> GCP-Native Data Architecture &amp; Paradigm Shifts</strong></li> </ol> <ul> <li><strong>Schema as Code:</strong> Design and implement version-controlled, programmatically managed data schemas natively integrated with <strong>Google BigQuery</strong>. Ensure schemas evolve seamlessly using GCP DevOps tools (Cloud Build, Artifact Registry) and Terraform.</li> <li><strong>Data as Code:</strong> Treat data assets with software engineering rigor. Implement data versioning, programmability, and automated quality testing using BigQuery features (like Table Snapshots and Time Travel), dbt, and Delta Lake on GCP.</li> <li><strong>Configuration Data Engineering:</strong> Architect highly optimized, metadata-driven, configuration-led data pipelines using <strong>Google Cloud Composer (Airflow)</strong> or <strong>Dataflow</strong> to abstract infrastructure complexity.</li> </ul> <ol> <li><strong> Scientific Domain Integration</strong></li> </ol> <ul> <li>Translate complex biological and chemical concepts (e.g., molecular modalities, chemical structures, solubility traits) into highly scalable logical and physical data models within BigQuery and Databricks.</li> <li>Collaborate closely with computational chemists, biologists, and AI engineers to ensure the data architecture natively supports predictive <em>in-silico</em> modeling.</li> <li>Design robust data layouts that allow autonomous AI agents to easily "dip into" molecular data, extract properties, and explain molecular behavior.</li> </ul> <ol> <li><strong> Platform &amp; Ecosystem Strategy</strong></li> </ol> <ul> <li>Optimize the interoperability between <strong>Databricks on GCP</strong> (Lakehouse architecture) and enterprise-wide <strong>Google BigQuery</strong> storage and analytics. [<a href="https://www.linkedin.com/posts/sougata-rakshit-565675132_databricks-gcp-lakehouse-activity-7359234260265746432-7Y7a">1</a>]</li> <li>Inform the integration of semantic web technologies and knowledge graphs (e.g., <strong>StarDog</strong>) into the overarching Google Cloud data fabric.</li> <li>Ensure data availability and high-performance querying for downstream multi-agent AI ecosystems (<strong>Agentic Hubs</strong> built on Google Cloud's AI suite or custom frameworks).</li> </ul> <p><strong>Required Skills &amp; Qualifications</strong></p> <p><strong>Scientific Domain Knowledge [</strong><a href="https://talents.studysmarter.co.uk/companies/google-deepmind/research-scientist-llm-science-4688481/"><strong>1</strong></a><strong>]</strong></p> <ul> <li><strong>Mandatory:</strong> Strong background or proven experience working inside life sciences, pharmaceuticals, biotech, or scientific research organizations.</li> <li>Ability to converse fluently with scientists regarding therapeutic modalities, molecular properties, and R&amp;D pipelines without needing to be a wet-lab scientist.</li> </ul> <p><strong>GCP &amp; Technical Architecture Expertise</strong></p> <ul> <li><strong>GCP Data Stack:</strong> Mastery of <strong>Google BigQuery</strong> (including BigLake, analytics hubs, and nested JSON schemas) and <strong>Databricks on GCP</strong>.</li> <li><strong>Software-Defined Data:</strong> Proven track record of implementing <strong>Schema as Code</strong> and <strong>Data as Code</strong> paradigms using tools like <strong>Terraform</strong>, <strong>dbt</strong>, and Git-based CI/CD workflows.</li> <li><strong>Pipeline Automation:</strong> Deep experience with configuration-driven pipeline orchestrators, specifically <strong>Google Cloud Composer / Apache Airflow</strong>.</li> <li><strong>Modeling &amp; Semantics:</strong> Strong understanding of relational, dimensional, and graph-based data modeling. Familiarity with knowledge graphs (e.g., StarDog) or biomedical ontologies is a major plus.</li> </ul> <p><strong>Soft Skills &amp; Leadership</strong></p> <ul> <li><strong>Abstract Thinking:</strong> Ability to conceptualize and suggest complex <em>in-silico</em> data solutions at a high strategic level without getting bogged down by immediate technology limitations.</li> <li><strong>Communication:</strong> Exceptional ability to articulate the business and scientific value of pure data architecture to non-technical executive stakeholders.</li> </ul> <p><strong>Preferred Qualifications</strong></p> <ul> <li>Professional Google Cloud Data Engineer or Google Cloud Professional Cloud Architect certification.</li> <li>Degree in Computer Science, Data Engineering, Bioinformatics, Computational Chemistry, or a related quantitative field.</li> <li>Experience setting up GCP data foundations specifically engineered to feed Large Language Models (e.g., Vertex AI / Gemini) and autonomous AI agents.</li> </ul> <p><strong>Location </strong>: Not a constraint</p><div class="content-conclusion"><p><strong>Some useful links:</strong></p> <p><strong><a href="https://xebia.com/">Xebia | Creating Digital Leaders.</a></strong></p> <p><a href="https://www.linkedin.com/company/xebia/mycompany/">https://www.linkedin.com/company/xebia/mycompany/</a></p> <p><a href="http://twitter.com/xebiaindia">http://twitter.com/xebiaindia</a></p> <p><a href="http://www.youtube.com/XebiaIndia">http://www.youtube.com/XebiaIndia</a></p> <p>&nbsp;</p></div>

759,000+ hidden jobs like this

APAC and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.