Team Lead, DevOps Engineer

Intermedia Intelligent Communications

WorldwideRemote1w ago

Apply

Employment: Full-time
Seniority: Lead

About the role

Integration APIs and microservices
Multi-region Kubernetes clusters (cloud & on‑prem)
CI/CD and GitOps tooling (GitHub-centric)
Internal Developer Platform (IDP) components
Critical shared infrastructure (RTE, Redis, Harbor, Consul, Chart Museum, API Gateway, etc.)

What you will be doing:

Lead and grow a small DevOps engineering team supporting Unite Services and Unite Integrations.
Drive team planning and execution across operational efficiency, security, reliability, and platform maturity epics (e.g., quarterly OKRs/initiatives).
Provide technical mentorship on Kubernetes, GitHub, CI/CD, and cloud infrastructure.
Collaborate closely with Engineering, SRE, Security, and Ops on roadmap, incident resolution, and cross-team initiatives.

Own the operational health of Unite Services and Unite Integrations: Lead production deployments and release processes for backend services (e.g., Unite Notifications, integrations platform, Salesforce-related services).
Oversee multi-environment support (DEV/QA/PROD and special environments like RTE).

Operate and evolve Kubernetes clusters (cloud and on‑prem), including:
- Cluster upgrades and migration playbooks
- Ingress/Nginx and service mesh / networking changes
- Resource/capacity optimization (CPU throttling, scaling, etc.)
Manage supporting infrastructure for Unite platforms:
- Redis, Consul, Harbor, Chart Museum, API gateways, message queues (e.g., RabbitMQ)
Contribute to and adopt Internal Developer Platform (IDP) capabilities (e.g., API Gateway, Apache Flink management, Crossplane-based infrastructure).

Lead migration and consolidation of repos and pipelines to GitHub (from TFS/Azure DevOps and others).
Design, implement, and maintain GitHub-based CI/CD:
- GitHub Actions workflows
- Self-hosted GitHub runners (including K8s-based and Windows runners)
- Versioning and branching strategies
Introduce and maintain GitOps practices for infrastructure and application deployments.
Automate repetitive tasks and operational workflows via scripting and infrastructure-as-code.

Drive security initiative epics (e.g., Ransomware preparedness, vulnerability remediation like Redis Lua RCE, tagging in Qualys/CrowdStrike).
Work with Security to design and implement:
- Rate limiting, WAF rules, and API protection (e.g., Cloudflare for SCIM and public APIs)
- Secrets management and secure configuration
Define and execute disaster recovery and ransomware recovery strategies for UST / Unite Services / Integrations.
Improve reliability and quality of platforms via SLIs/SLOs, operational runbooks, and post-incident improvements.

Own observability for Unite Services and Unite Integrations:
- Metrics, logs, traces (e.g., Prometheus, Grafana, ELK, etc.)
- Dashboards for capacity (Kubernetes CPU usage, throttling, etc.) and error budgets
Lead incident response for platform issues:
- Triage, mitigation, communication, and follow-up actions
- Coordination with product/engineering teams during outages
Create and maintain detailed runbooks and recovery procedures, especially for:
- RTE environment, K8s clusters, Harbor, Consul, Chart Museum
- Disaster recovery and ransomware scenarios

Produce and maintain clear, actionable documentation:
- Infrastructure design and configuration
- CI/CD and GitOps processes
- Recovery and maintenance procedures
- RFCs for new platform capabilities (e.g., API Gateway, Apache Flink)
Contribute to process improvements in CI/CD, deployment strategies, and developer experience.
Help define and maintain product maturity models for Unite Integrations and related platforms.

What you will bring to the role:

Strong background in platform / DevOps engineering for complex distributed systems.
Deep understanding of infrastructure components: compute, networking, storage, DNS, certificates, load balancing, and how they interact.
Kubernetes & Containers
- Deep knowledge of Kubernetes (cluster operations, upgrades, scaling, security, ingress, RBAC).
- Experience with Docker/containerization in production.
Cloud & On‑Prem
- Solid experience with at least one major cloud provider (Azure preferred; AWS or GCP a plus).
- Comfort working in hybrid environments (cloud + on‑prem).
CI/CD & GitHub
- Strong experience designing and running CI/CD pipelines.
- Hands-on with GitHub Actions and self-hosted runners (including scaling and security).
- Experience with other CI/CD tools (Azure DevOps, GitLab, ArgoCD) is a plus.
Infrastructure as Code & Automation
- Proficiency with IaC tools (Terraform, Crossplane, Helm, etc.).
- Strong scripting/coding skills (e.g., Python, Go, Bash, PowerShell) to automate complex operational workflows.
Config Management
- Experience with tools like Ansible, Chef, or similar for configuration management and automation.
Observability & Monitoring
- Experience with Prometheus, Grafana and centralized logging stacks (e.g., ELK, Splunk or similar).
- Ability to build meaningful dashboards and alerts, and interpret telemetry to drive improvements.
Security & Compliance
- Understanding of security best practices across network, infrastructure, and application layers.
- Experience with WAFs, rate limiting, vulnerability management tools (e.g., Qualys, CrowdStrike) is a plus.

Proven experience leading DevOps / platform teams or being a strong technical lead.
Ability to mentor engineers and raise the technical bar for the team.
Experience leading small/medium-sized projects using Agile/Scrum or similar methodologies.
Strong communication skills, able to translate technical constraints into clear options and decisions for stakeholders.
Demonstrated ability to own outcomes: from design through implementation, rollout, and ongoing operations.
Strong troubleshooting skills and calm under pressure during incidents.

Experience defining or implementing Internal Developer Platforms.
Experience with API Gateway and stream processing platforms (e.g., Apache Flink).
Background in ransomware preparedness, disaster recovery planning, or business continuity for critical systems.
Experience operating systems at multi-region or multi-tenant scale.

Diversity, Inclusion, and Equal Opportunity

759,000+ hidden jobs like this

Intermedia Intelligent Communications and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

Unlimited applications — free stops at 5
Track every application in one place
Apply straight to the source, one click
Save & organize roles you love
Roles pulled from company boards before the big sites

Weekly

$9.99

$4.99/week

For an active search. Cancel anytime.

Get Weekly

Monthly

$24.99

$12.99/month

The smart pick. Save 35% vs weekly.

Get Monthly

Lifetime

$99

$49.99once

Pay once. Every future feature, forever.

Get Lifetime