Member of Technical Staff - Compilers

Architect

Palo AltoOn-site2mo ago

Apply

Employment: Full-time
Seniority: Staff

About the role

About Us

Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel.

We're looking for staff/principal-level compiler engineers with deep experience building code generation toolchains for custom AI accelerators. Ideal candidates have shipped production compilers at places like Apple, Google (XLA/TPU), Groq, Cerebras, Qualcomm, AMD, or similar.

What You'll Do

As a Member of the Technical Staff on the Compilers team at Architect, you'll own the compiler stack targeting our SIMD/VLIW NPU — from graph ingestion through code generation on production silicon. You'll work directly with the NPU architect to co-design the ISA, closing the loop between compiler needs and hardware decisions.

Own the compiler end-to-end: graph ingestion (ONNX, PyTorch) through IR optimization, AI-driven code generation, instruction scheduling, and register allocation for a SIMD/VLIW NPU.
Implement and own the memory management layer; for instance SW-managed on-chip scratchpad memory with the compiler handling data tiling, bank allocation, DMA scheduling, and double-buffering across SRAM banks.
Design and iterate on mid-end and backend optimization passes: operator fusion, loop transformations, vectorization, and software pipelining to close the gap between peak and achieved throughput.
Co-design the ISA and instruction encoding with the architect and silicon team. Feed real workload performance data back into architectural decisions.
Support quantization and mixed-precision lowering (32bit single-precision FP or INT, along with lower INT8/4, BF16, FP16/8/4 precisions) with correct numerics end-to-end.
Benchmark compiler output against cycle-accurate models, RTL simulation, and FPGA prototypes. Own QoR tracking.
Grow into a compiler team lead as the team scales.

What We'd Like to See

Qualifications & Skills:

Degree: Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or a closely related field.
Experience: 5+ years building compilers or code generation toolchains for custom accelerators. Must have targeted ML/AI hardware compiler experience, as general-purpose (GCC/LLVM for CPUs) is not sufficient.
Domain Background: Hands-on experience on at least one of: Apple Neural Engine compiler, Google XLA / Edge TPU / TPU codegen, Groq TSP compiler (spatial scheduling, IR dialect design), Cerebras compiler stack, Qualcomm Hexagon NN / AI Engine, AMD AIE / Vitis AI, or similar/equivalent custom accelerator compiler(s).
Backend Mechanics: Strong grasp of instruction scheduling, register allocation, and software pipelining — especially for SIMD/VLIW or spatial architectures.
ML Optimizations: Experience with tiling strategies, loop nest optimization, and operator fusion for ML workloads (such as convolution, attention, element-wise ops, reduction, transpositions, etc.).
SW-Managed Memory: Experience with scratchpad type memory allocation, data layout, DMA orchestration, and multi-buffering.
Coding: Strong C++. Python proficiency. Familiarity with MLIR or LLVM infrastructure.
Leadership: Ability to lead and grow the compiler team over time.

Bonus:

HW/SW co-design experience: defining ISA features, instruction encodings, or hardware interfaces driven by compiler needs.
IR design for ML accelerators (custom dialects, MLIR-based flows, or graph-level IRs like XLA HLO).
ML framework experience (PyTorch, TensorFlow) and portable graph formats (ONNX).
Experience benchmarking and profiling compiler output on real hardware, FPGA, or cycle-accurate simulators.
Understanding of ML inference systems and workload-level optimizations: FlashAttention, RadixAttention, PagedAttention, continuous batching, speculative decoding, KV cache management, and prefill/decode scheduling.
Contributions to open-source ML compiler projects (TVM, MLIR, Triton, XLA).
Domain-specific expertise: Track record on energy-efficient, high-performance HW accelerator bring-up.

What We Offer

Competitive salary and meaningful equity stake
Fast-paced startup with autonomy and visible impact
Cutting-edge challenges at the intersection of AI and silicon design
Direct ownership of the compiler stack as we scale

Perks & benefits

Equity Compensation

747,000+ hidden jobs like this

Architect and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime