Senior Site Reliability Engineer - Remote EST
Mosaic.tech
This is a fully remote role, however, you must be physically located in EST and be willing and able to work EST hours Monday-Friday and participate in on-call rotations.
Base salary for this role ranges from $170,000 - $206,000 per year.
5+ years of experience as a Senior SRE or Production Engineer (this is a hard requirement).
Deep Production Expertise: You must have extensive experience managing live, high-traffic SaaS environments; developer-only backgrounds without ops experience will not be a fit.
Cloud & Orchestration: Proven mastery of Kubernetes and AWS in production settings.
Coding/Scripting: Advanced proficiency in Python (preferred) or Go for automation; we need more than just Bash skills.
AI Knowledge: A strong understanding of or direct experience with AI/LLM technologies.
Observability: Hands-on experience with Datadog for monitoring and incident response.
Autonomy: Ability to work independently without direct daily oversight, managing production incidents and on-call responsibilities.
Time Zone: Located in the East Coast time zone to provide coverage overlap with our global teams.
Design, build, and operate production-grade Kubernetes infrastructure on AWS
Developing Ai Agents to handle incidents and root cause analisys
Build and maintain GitOps-based CI/CD pipelines using GitHub Actions and ArgoCD
Develop internal DevOps tooling and developer self-service platforms
Own monitoring, observability, and operational excellence using Datadog
Collaborate with engineering teams to improve delivery speed and reliability