Back to all jobs
V
Software Engineering - Data Engineer
Voltai Com
Menlo ParkOn-site1y ago
- Employment
- Full-time
About the role
- Collect, parse, and structure diverse data types—including text, images, tables, circuit diagrams, simulations, and signal data—into standardized formats suitable for machine learning applications
- Design and maintain scalable data pipelines that efficiently handle data ingestion, transformation, and integration into ML workflows, ensuring high throughput and reliability
- Optimize data storage solutions to balance performance, scalability, and cost-effectiveness, facilitating rapid access and processing of large datasets
- Collaborate with cross-functional teams, including ML and infra engineers, to curate high-quality training and evaluation datasets aligned with Voltai's product offerings
- Implement robust data validation and quality assurance processes to ensure the integrity and usability of datasets across various applications.
- Programming Languages: Proficiency in Python, with experience in compiled languages such as Go or Rust
- Data Parsing and Extraction: Expertise in parsing and extracting data from various formats and modalities, including PDFs, HTML, images, and binary files, utilizing tools like BeautifulSoup, pdfminer.six, and custom parsers
- Data Pipeline Frameworks: Experience with modern data pipeline frameworks such as Apache Airflow, Prefect, Dagster, or Apache Beam, enabling efficient orchestration of complex data workflows
- Data Processing Tools: Familiarity with tools like Apache Spark, Apache Flink, or similar platforms for large-scale data processing and transformation
- Database Systems: Strong knowledge of relational and non-relational databases, including PostgreSQL, Supabase, and other scalable storage solutions
- Cloud Platforms: In-depth experience with cloud services, particularly AWS, including S3, EC2, Lambda, and related services for deploying and managing data infrastructure
- Web Crawling and Agentic Crawling: Proficiency in building and managing web crawlers using frameworks like Scrapy, Firecrawl, or Crawl4AI, with an understanding of agentic crawling techniques to automate data extraction tasks
- Data Quality and Governance: Commitment to maintaining high data quality standards, with experience in implementing data validation, cleansing, and governance practices
- A strong background in hardware/electronics, gained through professional, academic, or personal projects
- Experience in constructing datasets for large scale ML models, specifically LLMs
- Contributions to open-source initiatives
- Experience thriving in a fast-paced, hyper-growth startup environment
- Unlimited PTO: Recharge when you need it, no questions asked.
- Comprehensive Health Coverage: Medical, dental, and vision insurance for you and your dependents.
- Free Meals and Snacks: Daily lunches, dinners, and snacks in the office.
- Professional Growth: We invest in your continuous learning and offer opportunities to expand your skills.
- Visa Sponsorship: We welcome global talent and provide visa sponsorship to support qualified candidates.
Perks & benefits
- Vision Insurance
- Medical Insurance
- Unlimited Vacation
- Paid Time Off
764,000+ hidden jobs like this
Voltai Com and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites