Back to all jobs
F
Senior Software Engineer, AI Job Orchestration
Firmus Technologies
Sydney2d ago
- Seniority
- Senior
About the role
<p><strong>Firmus Technologies</strong></p>
<p><span data-contrast="none"><span data-ccp-parastyle="FIR Body" data-ccp-parastyle-defn="{"ObjectId":"baa2b379-a6d3-56f4-af2b-6a08de9434e5|1","ClassId":1073872969,"Properties":[469777841,"Aeonik",469777842,"Aeonik",469777843,"Aeonik",469777844,"Aeonik",469769226,"Aeonik",201342446,"1",201342447,"5",201342448,"1",201342449,"1",201341986,"1",268442635,"20",335551500,"197122",335559740,"264",201341983,"0",335559738,"145",469775450,"FIR Body",201340122,"2",134234082,"true",134233614,"true",469778129,"FIRBody",335572020,"1",469778324,"Body Text"]}">Firmus Technologies is a global </span><span data-ccp-parastyle="FIR Body">leader</span><span data-ccp-parastyle="FIR Body"> pioneering the development </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">and operation of efficient AI infrastructure across Asia Pacific.</span><span data-ccp-parastyle="FIR Body"> </span></span><span data-ccp-props="{"201341983":0,"335559738":145,"335559740":264}"> </span></p>
<p><span data-contrast="none"><span data-ccp-parastyle="FIR Body">Founded in Australia in 2019, our mission is to create the most efficient AI infrastructure by </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">combining </span><span data-ccp-parastyle="FIR Body">cutting-edge</span><span data-ccp-parastyle="FIR Body"> technology with a steadfast commitment to sustainability.</span></span><span data-ccp-props="{"201341983":0,"335559738":145,"335559740":264}"> </span></p>
<p><span data-contrast="none"><span data-ccp-parastyle="FIR Body">At Firmus, we are unique in our approach. We design, build, and </span><span data-ccp-parastyle="FIR Body">operate</span><span data-ccp-parastyle="FIR Body"> a new class of digital </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">infrastructure – the AI Factory. Through our model-to-grid technology approach, we have pushed </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">the boundaries of multi-generational liquid cooling systems, energy management, AI software </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">orchestration, and construction. For our customers, this approach allows us to make every watt </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">count and deliver low-cost AI tokens globally.</span></span><span data-ccp-props="{"201341983":0,"335559738":145,"335559740":264}"> </span></p>
<p> </p>
<p><strong>Firmus AI Cloud</strong></p>
<p><span data-contrast="none"><span data-ccp-parastyle="FIR Body" data-ccp-parastyle-defn="{"ObjectId":"baa2b379-a6d3-56f4-af2b-6a08de9434e5|1","ClassId":1073872969,"Properties":[469777841,"Aeonik",469777842,"Aeonik",469777843,"Aeonik",469777844,"Aeonik",469769226,"Aeonik",201342446,"1",201342447,"5",201342448,"1",201342449,"1",201341986,"1",268442635,"20",335551500,"197122",335559740,"264",201341983,"0",335559738,"145",469775450,"FIR Body",201340122,"2",134234082,"true",134233614,"true",469778129,"FIRBody",335572020,"1",469778324,"Body Text"]}">Our large-scale GPU cloud platform, Firmus AI Cloud, is purpose-built </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">to deliver energy-efficient AI </span><span data-ccp-parastyle="FIR Body">compute</span><span data-ccp-parastyle="FIR Body"> at scale to customers.</span></span><span data-ccp-props="{"201341983":0,"335559738":145,"335559740":264}"> </span></p>
<p><span data-contrast="none"><span data-ccp-parastyle="FIR Body">It empowers developers, enterprises, educational institutions, and government users to train and </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">deploy AI models with unmatched efficiency and cost savings. With an ever-growing suite of services </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">and applications, we are committed to delivering a cloud experience that is market-leading, </span></span><span data-contrast="none"><span data-ccp-parastyle="FIR Body">proprietary, and built to scale.</span></span><span data-ccp-props="{"201341983":0,"335559738":145,"335559740":264}"> </span></p>
<p><strong>Role Summary</strong></p>
<p>As a Senior Software Engineer on the AI and Applications team, you'll own the control plane that powers AI workload submission across Firmus AI Platforms. You'll design and build unified job submission APIs, CLI, and web interfaces for training, inference, and fine-tuning workloads on Kubernetes and Slurm—implementing RBAC, multi-tenant isolation, resource quotas, and intelligent scheduling policies (priority classes, pre-emption, fairness). You'll create template catalog for pre-built training and inference recipes, wire observability pipelines for per-job GPU metrics cost tracking and expose telemetry APIs for platform monitoring. This role requires deep Kubernetes and Slurm expertise, strong distributed systems knowledge, and close collaboration with infra, platform, and LLM engineering teams to deliver a seamless, production-grade job orchestration experience for hyperscaler customers.</p>
<p><br><strong>Key Responsibilities</strong></p>
<ul>
<li>Design and build unified job submission APIs, CLI, and web UI for all AI workload types (training, inference, fine-tuning) on Kubernetes and Slurm with Firmus AI Factory context (tenant isolation, resource requests, metadata tagging, observability hooks).</li>
<li>Implement comprehensive job metadata models and schemas: track job ID, job type, tenant, user, resource requirements, priority class, timestamps, lineage, execution status.</li>
<li>Integrate authentication/authorization (RBAC) and resource quotas; enforce multi-tenant isolation at submission time across all job types.</li>
<li>Build AI job scheduling and orchestration layer: priority classes, preemption policies, fairness algorithms, resource quota enforcement, and intelligent job routing.</li>
<li>Build the AI Factory template catalog: discovery, parameter validation, and manifest generation for training templates, inference serving templates, and fine-tuning recipes.</li>
<li>Wire job submissions to observability pipeline: inject labels/annotations (job_id, tenant, user, model_name, job_type) so metrics are tagged per-job.</li>
<li>Expose job-level telemetry APIs (GPU metrics, cost accrual, MFU progression for training; latency, throughput, tokenomics for inferencing) for platform telemetry and monitoring.</li>
<li>Extend job submission to handle inference workloads: design inference job specifications (model, batch size, latency SLA, cost constraints); integrate with inference serving APIs.</li>
<li>Coordinate with platform team on observability dashboard integration, with LLM engineers on template design, and with ModelOps on reliability standards. </li>
</ul>
<p> <br><strong>Skills & Experience</strong></p>
<ul>
<li>5–7 years of backend engineering experience building production APIs and distributed systems (Python, Go, or Java).</li>
<li>Deep Kubernetes expertise: understand Job controllers, Pod specs, resource requests/limits, RBAC, network policies, debugging.</li>
<li>Hands-on Slurm experience: job submission, resource allocation, job queues, sbatch scripting.</li>
<li>Strong distributed systems knowledge: understand scheduling algorithms, fairness, preemption, resource management.</li>
<li>Strong data modelling: can design clear schemas for job metadata, handle versioning and migrations, ensure backward compatibility.</li>
<li>DevOps mindset: comfortable with observability, logging, tracing, and production troubleshooting.</li>
<li>Experience with streaming APIs and real-time webhooks, and system-level integration patterns. </li>
</ul>
<p><br><strong>Key Competencies</strong></p>
<ul>
<li>Job Orchestration & Scheduling: shipped job scheduling or workflow systems at scale; understands job lifecycle, failure modes, and scheduling policies.</li>
<li>Multi-Tenancy Design: can architect fair resource allocation, quota enforcement, pre-emption, and data isolation across job types.</li>
<li>API Design: RESTful or gRPC APIs that are intuitive and extensible; handles versioning gracefully.</li>
<li>Systems Architecture: understands how job submission connects to training, inferencing, observability, cost tracking, and incident response.</li>
<li>Cross-Domain Partnership: works closely with infra team, platform team, LLM engineers; clear handoff points and API contracts. </li>
</ul>
<p><br><strong>Success Metrics</strong></p>
<ul>
<li>Unified orchestration adoption increases: teams use the standard job interface rather than bespoke/manual pathways.</li>
<li>Scheduling effectiveness & fairness improves: predictable scheduling under contention with reduced noisy-neighbor impact.</li>
<li>Orchestration reliability stays high: jobs reliably start, run, and complete across K8s/Slurm/inference integrations.</li>
<li>End-to-end workflow automation increases: higher share of workflows complete without human intervention (e.g., train→register→serve).</li>
<li>Interface stability & compatibility remains strong: the orchestration API evolves without breaking users. </li>
</ul>
<p><br><strong>Location & Reporting</strong></p>
<ul>
<li>This role can be based in either Sydney, Australia, or Singapore.</li>
<li>Reporting to Head of AI & Applications</li>
</ul>
<p><br><strong>Employment Basis</strong></p>
<p>Full-time</p>
<p><br><strong>Diversity</strong></p>
<p>At Firmus, we are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions. </p>
<p>Join us in our mission to revolutionize the AI industry through sustainable practices and cutting-edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure. </p>
753,000+ hidden jobs like this
Firmus Technologies and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.
Everything Pro unlocks:
- Unlimited applications — free stops at 5
- Track every application in one place
- Apply straight to the source, one click
- Save & organize roles you love
- Roles pulled from company boards before the big sites