Back to all jobs
V

Software Engineering - Data Engineer

Voltai Com

Menlo ParkOn-site1y ago
Employment
Full-time

About the role

  • Collect, parse, and structure diverse data types—including text, images, tables, circuit diagrams, simulations, and signal data—into standardized formats suitable for machine learning applications
  • Design and maintain scalable data pipelines that efficiently handle data ingestion, transformation, and integration into ML workflows, ensuring high throughput and reliability
  • Optimize data storage solutions to balance performance, scalability, and cost-effectiveness, facilitating rapid access and processing of large datasets
  • Collaborate with cross-functional teams, including ML and infra engineers, to curate high-quality training and evaluation datasets aligned with Voltai's product offerings
  • Implement robust data validation and quality assurance processes to ensure the integrity and usability of datasets across various applications.
  • Programming Languages: Proficiency in Python, with experience in compiled languages such as Go or Rust
  • Data Parsing and Extraction: Expertise in parsing and extracting data from various formats and modalities, including PDFs, HTML, images, and binary files, utilizing tools like BeautifulSoup, pdfminer.six, and custom parsers
  • Data Pipeline Frameworks: Experience with modern data pipeline frameworks such as Apache Airflow, Prefect, Dagster, or Apache Beam, enabling efficient orchestration of complex data workflows
  • Data Processing Tools: Familiarity with tools like Apache Spark, Apache Flink, or similar platforms for large-scale data processing and transformation
  • Database Systems: Strong knowledge of relational and non-relational databases, including PostgreSQL, Supabase, and other scalable storage solutions
  • Cloud Platforms: In-depth experience with cloud services, particularly AWS, including S3, EC2, Lambda, and related services for deploying and managing data infrastructure
  • Web Crawling and Agentic Crawling: Proficiency in building and managing web crawlers using frameworks like Scrapy, Firecrawl, or Crawl4AI, with an understanding of agentic crawling techniques to automate data extraction tasks
  • Data Quality and Governance: Commitment to maintaining high data quality standards, with experience in implementing data validation, cleansing, and governance practices
  • A strong background in hardware/electronics, gained through professional, academic, or personal projects
  • Experience in constructing datasets for large scale ML models, specifically LLMs
  • Contributions to open-source initiatives
  • Experience thriving in a fast-paced, hyper-growth startup environment
  • Unlimited PTO: Recharge when you need it, no questions asked.
  • Comprehensive Health Coverage: Medical, dental, and vision insurance for you and your dependents. 
  • Free Meals and Snacks: Daily lunches, dinners, and snacks in the office.
  • Professional Growth: We invest in your continuous learning and offer opportunities to expand your skills.
  • Visa Sponsorship: We welcome global talent and provide visa sponsorship to support qualified candidates.

Perks & benefits

  • Vision Insurance
  • Medical Insurance
  • Unlimited Vacation
  • Paid Time Off

764,000+ hidden jobs like this

Voltai Com and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.