Back to all jobs
Architect logo

Member of Technical Staff - Compilers

Architect
Palo AltoOn-site2mo ago
Employment
Full-time
Seniority
Staff

About the role

About Us

Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel.

We're looking for staff/principal-level compiler engineers with deep experience building code generation toolchains for custom AI accelerators. Ideal candidates have shipped production compilers at places like Apple, Google (XLA/TPU), Groq, Cerebras, Qualcomm, AMD, or similar.

What You'll Do

As a Member of the Technical Staff on the Compilers team at Architect, you'll own the compiler stack targeting our SIMD/VLIW NPU — from graph ingestion through code generation on production silicon. You'll work directly with the NPU architect to co-design the ISA, closing the loop between compiler needs and hardware decisions.

  • Own the compiler end-to-end: graph ingestion (ONNX, PyTorch) through IR optimization, AI-driven code generation, instruction scheduling, and register allocation for a SIMD/VLIW NPU.

  • Implement and own the memory management layer; for instance SW-managed on-chip scratchpad memory with the compiler handling data tiling, bank allocation, DMA scheduling, and double-buffering across SRAM banks.

  • Design and iterate on mid-end and backend optimization passes: operator fusion, loop transformations, vectorization, and software pipelining to close the gap between peak and achieved throughput.

  • Co-design the ISA and instruction encoding with the architect and silicon team. Feed real workload performance data back into architectural decisions.

  • Support quantization and mixed-precision lowering (32bit single-precision FP or INT, along with lower INT8/4, BF16, FP16/8/4 precisions) with correct numerics end-to-end.

  • Benchmark compiler output against cycle-accurate models, RTL simulation, and FPGA prototypes. Own QoR tracking.

  • Grow into a compiler team lead as the team scales.

What We'd Like to See

Qualifications & Skills:

  • Degree: Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or a closely related field.

  • Experience: 5+ years building compilers or code generation toolchains for custom accelerators. Must have targeted ML/AI hardware compiler experience, as general-purpose (GCC/LLVM for CPUs) is not sufficient.

  • Domain Background: Hands-on experience on at least one of: Apple Neural Engine compiler, Google XLA / Edge TPU / TPU codegen, Groq TSP compiler (spatial scheduling, IR dialect design), Cerebras compiler stack, Qualcomm Hexagon NN / AI Engine, AMD AIE / Vitis AI, or similar/equivalent custom accelerator compiler(s).

  • Backend Mechanics: Strong grasp of instruction scheduling, register allocation, and software pipelining — especially for SIMD/VLIW or spatial architectures.

  • ML Optimizations: Experience with tiling strategies, loop nest optimization, and operator fusion for ML workloads (such as convolution, attention, element-wise ops, reduction, transpositions, etc.).

  • SW-Managed Memory: Experience with scratchpad type memory allocation, data layout, DMA orchestration, and multi-buffering.

  • Coding: Strong C++. Python proficiency. Familiarity with MLIR or LLVM infrastructure.

  • Leadership: Ability to lead and grow the compiler team over time.

Bonus:

  • HW/SW co-design experience: defining ISA features, instruction encodings, or hardware interfaces driven by compiler needs.

  • IR design for ML accelerators (custom dialects, MLIR-based flows, or graph-level IRs like XLA HLO).

  • ML framework experience (PyTorch, TensorFlow) and portable graph formats (ONNX).

  • Experience benchmarking and profiling compiler output on real hardware, FPGA, or cycle-accurate simulators.

  • Understanding of ML inference systems and workload-level optimizations: FlashAttention, RadixAttention, PagedAttention, continuous batching, speculative decoding, KV cache management, and prefill/decode scheduling.

  • Contributions to open-source ML compiler projects (TVM, MLIR, Triton, XLA).

  • Domain-specific expertise: Track record on energy-efficient, high-performance HW accelerator bring-up.

What We Offer

  • Competitive salary and meaningful equity stake

  • Fast-paced startup with autonomy and visible impact

  • Cutting-edge challenges at the intersection of AI and silicon design

  • Direct ownership of the compiler stack as we scale

Perks & benefits

  • Equity Compensation

747,000+ hidden jobs like this

Architect and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.