Back to all jobs
Intermedia Intelligent Communications logo

Team Lead, DevOps Engineer

Intermedia Intelligent Communications
WorldwideRemote1w ago
Employment
Full-time
Seniority
Lead

About the role

  • Integration APIs and microservices
  • Multi-region Kubernetes clusters (cloud & on‑prem)
  • CI/CD and GitOps tooling (GitHub-centric)
  • Internal Developer Platform (IDP) components
  • Critical shared infrastructure (RTE, Redis, Harbor, Consul, Chart Museum, API Gateway, etc.)

What you will be doing:

  • Lead and grow a small DevOps engineering team supporting Unite Services and Unite Integrations.
  • Drive team planning and execution across operational efficiency, security, reliability, and platform maturity epics (e.g., quarterly OKRs/initiatives).
  • Provide technical mentorship on Kubernetes, GitHub, CI/CD, and cloud infrastructure.
  • Collaborate closely with Engineering, SRE, Security, and Ops on roadmap, incident resolution, and cross-team initiatives.
  • Own the operational health of Unite Services and Unite Integrations: Lead production deployments and release processes for backend services (e.g., Unite Notifications, integrations platform, Salesforce-related services).
  • Oversee multi-environment support (DEV/QA/PROD and special environments like RTE).
  • Operate and evolve Kubernetes clusters (cloud and on‑prem), including:
    • Cluster upgrades and migration playbooks
    • Ingress/Nginx and service mesh / networking changes
    • Resource/capacity optimization (CPU throttling, scaling, etc.)
  •  Manage supporting infrastructure for Unite platforms:
    • Redis, Consul, Harbor, Chart Museum, API gateways, message queues (e.g., RabbitMQ)
  • Contribute to and adopt Internal Developer Platform (IDP) capabilities (e.g., API Gateway, Apache Flink management, Crossplane-based infrastructure).
  •  Lead migration and consolidation of repos and pipelines to GitHub (from TFS/Azure DevOps and others).
  • Design, implement, and maintain GitHub-based CI/CD:
    • GitHub Actions workflows
    • Self-hosted GitHub runners (including K8s-based and Windows runners)
    • Versioning and branching strategies
  • Introduce and maintain GitOps practices for infrastructure and application deployments.
  • Automate repetitive tasks and operational workflows via scripting and infrastructure-as-code.
  •  Drive security initiative epics (e.g., Ransomware preparedness, vulnerability remediation like Redis Lua RCE, tagging in Qualys/CrowdStrike).
  • Work with Security to design and implement:
    • Rate limiting, WAF rules, and API protection (e.g., Cloudflare for SCIM and public APIs)
    • Secrets management and secure configuration
  • Define and execute disaster recovery and ransomware recovery strategies for UST / Unite Services / Integrations.
  • Improve reliability and quality of platforms via SLIs/SLOs, operational runbooks, and post-incident improvements.
  • Own observability for Unite Services and Unite Integrations:
    • Metrics, logs, traces (e.g., Prometheus, Grafana, ELK, etc.)
    • Dashboards for capacity (Kubernetes CPU usage, throttling, etc.) and error budgets
  • Lead incident response for platform issues:
    • Triage, mitigation, communication, and follow-up actions
    • Coordination with product/engineering teams during outages
  • Create and maintain detailed runbooks and recovery procedures, especially for:
    •  RTE environment, K8s clusters, Harbor, Consul, Chart Museum
    • Disaster recovery and ransomware scenarios
  • Produce and maintain clear, actionable documentation:
    • Infrastructure design and configuration
    • CI/CD and GitOps processes
    • Recovery and maintenance procedures
    • RFCs for new platform capabilities (e.g., API Gateway, Apache Flink)
  •  Contribute to process improvements in CI/CD, deployment strategies, and developer experience.
  • Help define and maintain product maturity models for Unite Integrations and related platforms.

What you will bring to the role:

  • Strong background in platform / DevOps engineering for complex distributed systems.
  • Deep understanding of infrastructure components: compute, networking, storage, DNS, certificates, load balancing, and how they interact.
  • Kubernetes & Containers
    • Deep knowledge of Kubernetes (cluster operations, upgrades, scaling, security, ingress, RBAC).
    • Experience with Docker/containerization in production.
  • Cloud & On‑Prem
    • Solid experience with at least one major cloud provider (Azure preferred; AWS or GCP a plus).
    • Comfort working in hybrid environments (cloud + on‑prem).
  • CI/CD & GitHub
    • Strong experience designing and running CI/CD pipelines.
    • Hands-on with GitHub Actions and self-hosted runners (including scaling and security).
    • Experience with other CI/CD tools (Azure DevOps, GitLab, ArgoCD) is a plus.
  • Infrastructure as Code & Automation
    • Proficiency with IaC tools (Terraform, Crossplane, Helm, etc.).
    • Strong scripting/coding skills (e.g., Python, Go, Bash, PowerShell) to automate complex operational workflows.
  • Config Management
    • Experience with tools like Ansible, Chef, or similar for configuration management and automation.
  • Observability & Monitoring
    • Experience with Prometheus, Grafana and centralized logging stacks (e.g., ELK, Splunk or similar).
    • Ability to build meaningful dashboards and alerts, and interpret telemetry to drive improvements.
  • Security & Compliance
    • Understanding of security best practices across network, infrastructure, and application layers.
    • Experience with WAFs, rate limiting, vulnerability management tools (e.g., Qualys, CrowdStrike) is a plus.
  • Proven experience leading DevOps / platform teams or being a strong technical lead.
  • Ability to mentor engineers and raise the technical bar for the team.
  • Experience leading small/medium-sized projects using Agile/Scrum or similar methodologies.
  • Strong communication skills, able to translate technical constraints into clear options and decisions for stakeholders.
  • Demonstrated ability to own outcomes: from design through implementation, rollout, and ongoing operations.
  • Strong troubleshooting skills and calm under pressure during incidents.
  • Experience defining or implementing Internal Developer Platforms.
  • Experience with API Gateway and stream processing platforms (e.g., Apache Flink).
  • Background in ransomware preparedness, disaster recovery planning, or business continuity for critical systems.
  • Experience operating systems at multi-region or multi-tenant scale.

Diversity, Inclusion, and Equal Opportunity

759,000+ hidden jobs like this

Intermedia Intelligent Communications and thousands of companies post here first — often days before LinkedIn or Indeed. Your first 5 applications are free; go Pro to apply without limits.

Everything Pro unlocks:

  • Unlimited applications — free stops at 5
  • Track every application in one place
  • Apply straight to the source, one click
  • Save & organize roles you love
  • Roles pulled from company boards before the big sites

Weekly

$9.99
$4.99/week

For an active search. Cancel anytime.

Most popular

Monthly

$24.99
$12.99/month

The smart pick. Save 35% vs weekly.

Lifetime

$99
$49.99once

Pay once. Every future feature, forever.