Senior Site Reliability Engineer - Remote EST

Mosaic.tech

Mosaic.tech

Software Engineering
Washington, DC, USA
USD 170k-206k / year
Posted on Mar 30, 2026
Join us as a Senior SRE where you’ll bridge the gap between cutting-edge AI innovation and rock-solid production stability. Working independently from the East Coast, you will collaborate with our global DevOps teams to automate 70% of your workload while owning the reliability of our AWS/Kubernetes environment. This is a role for a production-hardened engineer who wants a strong voice in technology decisions and the opportunity to build the future of AI-driven operations.

This is a fully remote role, however, you must be physically located in EST and be willing and able to work EST hours Monday-Friday and participate in on-call rotations.

Base salary for this role ranges from $170,000 - $206,000 per year.

  • 5+ years of experience as a Senior SRE or Production Engineer (this is a hard requirement).

  • Deep Production Expertise: You must have extensive experience managing live, high-traffic SaaS environments; developer-only backgrounds without ops experience will not be a fit.

  • Cloud & Orchestration: Proven mastery of Kubernetes and AWS in production settings.

  • Coding/Scripting: Advanced proficiency in Python (preferred) or Go for automation; we need more than just Bash skills.

  • AI Knowledge: A strong understanding of or direct experience with AI/LLM technologies.

  • Observability: Hands-on experience with Datadog for monitoring and incident response.

  • Autonomy: Ability to work independently without direct daily oversight, managing production incidents and on-call responsibilities.

  • Time Zone: Located in the East Coast time zone to provide coverage overlap with our global teams.

  • Design, build, and operate production-grade Kubernetes infrastructure on AWS

  • Developing Ai Agents to handle incidents and root cause analisys

  • Build and maintain GitOps-based CI/CD pipelines using GitHub Actions and ArgoCD

  • Develop internal DevOps tooling and developer self-service platforms

  • Own monitoring, observability, and operational excellence using Datadog

  • Collaborate with engineering teams to improve delivery speed and reliability