Inference Stack Engineer

EER Poland

Gdańsk2w ago

Apply

About the role

Inference Stack Engineer

(AI Systems / Compiler & Runtime)

We are building a next-generation AI inference stack designed for high-performance execution on modern and custom compute architectures. Our mission is to deliver industry-leading low-latency and high-throughput AI systems by designing and optimizing the full execution path — from model representation to hardware-level execution.

This is a deeply technical role at the intersection of compiler systems, AI runtimes, and high-performance computing.

You will work on core infrastructure that defines how modern AI models are executed efficiently at scale.

What you will do

Design and build components of an AI inference stack, from high-level model representation to low-level execution
Develop and extend a Python-based DSL for expressing AI workloads and kernels
Work on compiler infrastructure including:
- IR design and transformation pipelines
- graph lowering and optimization passes
- backend code generation for target execution environments
Optimize model execution for:
- latency
- throughput
- memory efficiency
- numerical stability
Contribute to runtime systems responsible for model execution and scheduling
Profile and analyze inference workloads to identify system bottlenecks
Collaborate closely with hardware and systems engineers on execution efficiency
Influence architecture decisions for next-generation AI execution platforms

What we are looking for

Strong software engineering background (C++ and Python)
Experience with performance-critical systems or compiler-related work
Understanding of AI model execution (especially transformers / LLMs)
Familiarity with compute graphs, tensor operations, or execution frameworks
Ability to analyze complex systems end-to-end (model → runtime → hardware)
Experience working with large codebases and system-level debugging
Strong communication skills and ability to work in cross-functional teams

Nice to have

Experience with compiler frameworks such as:
- LLVM
- MLIR
- Triton
- TVM
- XLA
Experience contributing to deep learning frameworks (PyTorch, TensorFlow, JAX)
Understanding of GPU or accelerator execution models
Experience with kernel optimization or operator-level performance tuning
Knowledge of distributed inference systems (e.g. NCCL, RPC-based serving)
Familiarity with hardware-aware optimizations (memory hierarchy, vectorization, scheduling)

What we offer

Work on the core execution layer of modern AI systems
Direct impact on inference performance of large-scale AI workloads
Collaboration with experts in compilers, systems, and AI infrastructure
Highly technical environment with strong engineering autonomy
Opportunity to shape the architecture of a next-generation inference stack
Competitive compensation and flexible working model

Why this role is different

This is not a typical ML engineering or application role.

You will not be training models.

You will be working on how models actually run efficiently, at scale, across compute systems, shaping the performance layer that sits between AI models and hardware.

479,000+ hidden jobs like this

EER Poland and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime