SRE - Staff Engineer - US Remote
ABOUT THE ROLE: SRE – Staff Engineer
The Aviatrix SRE team is a small global team of talented Systems Engineers/SREs, whose primary mission is to ensure the reliability, availability and performance of Aviatrix’s critical systems and services. This includes managing and collaborating on infrastructure components, rollouts, monitoring, incident management, capacity planning, change management, disaster recovery and security/compliance. The Aviatrix SRE team envisions a robust and resilient infrastructure that enables Aviatrix to deliver high-quality services to its users in an agile manner through automation, best practices, and a culture of operational excellence. The overall goal is to minimize service disruptions, improve system efficiency, and maintain a high level of customer satisfaction.
We are currently expanding the US based SRE team, we are searching for all levels of Systems Engineers/SRE’s including Staff and Principal Engineers to join Aviatrix US SRE Team.
As a SRE – Staff Engineer, you will be responsible for designing, implementing, maintaining, and deploying highly available, fault tolerant, complex, scalable and reliable systems by implementing automation, effective monitoring, and infrastructure-as-Code (IaC).
The Tech Stack: You will work extensively with Kubernetes, including with homemade operators and cdk8s, to manage application lifecycles, automate operational tasks, troubleshoot issues, integrate monitoring and alerting, optimize infrastructure, and ensure the reliability of applications running on Kubernetes. You will use Terraform to implement Infrastructure-as-Code (IaC) to enable rapid provisioning, configuration changes, and scaling. You will develop automation tools and frameworks in Golang and Python
ON CALL ROTATION: You will maintain an effective on-call rotation to ensure 24/7 coverage.
LOCATION: This role will allow you the opportunity to work remote anywhere in the US or Canada. Must be eligible to work in the US or Canada and currently live in one of these countries.
- Ensure Reliability and Availability: You will ensure uptime for crucial services and systems based on required business SLOs. Minimize service disruptions through proactive monitoring, capacity planning and fault-tolerant design.
- Architecture and System Design: you will design and architect complex, scalable and reliable systems.
- Automation and Efficiency: you will develop and implement automation tools and frameworks to automate routine tasks to reduce human error and to streamline and improve operational processes to increase efficiency.
- Build Observability and Monitoring tools: you will define, build, deploy, maintain, and extend our observability and monitoring tools to enhance system reliability and availability.
- Incident Management and Response: you will maintain an effective on-call rotation to ensure 24/7 coverage. You will respond to incident response procedures to swiftly address and mitigate service disruptions.
- Performance Monitoring and SLIs/SLOs: you will help define and monitor Service level Indicators (SLIs) and Service Level Objectives (SLOs) to set clear expectations for system performance.
- Collaboration: you will work closely with product engineering to ensure service-level objectives and reliability targets are met
- Problem-Solving & Troubleshooting: you will respond to escalations by troubleshooting complex system and application incidents, perform root cause analysis, implement necessary corrective actions.
- Thought Leadership and Innovation: you will stay up to date with latest industry trends, emerging technologies. Iterate on best practices to increase the quality & velocity of development and deliverables.
- 8+ years of experience maintaining and deploying highly available, fault-tolerant systems at scale
- Proficiency in Golang or Python is required
- Infrastructure-as-code (IaC): Deep understanding of Terraform core components (e.g., Terragrunt is a bonus) with real-world experience using Terraform for infrastructure provisioning and management
- At least one cloud service provider experience (e.g., AWS, GCP, Azure, OCI)
- Good knowledge with Kubernetes (e.g., cdk8s and operators are a bonus)
- Solid experience developing Automation tools and frameworks
- Experience with Logging Solutions (e.g., Loki, Syslog, Elasticsearch, Logstash, Kibana, Filebeat, Fluentbit, etc.)
- Experience with Monitoring and Metrics Solutions (e.g., Prometheus, Grafana, Victoria Metrics)
- Practical experience with Linux system administration
- Experience with Version control system (e.g., Git, GitHub) and code review
- Excellent communication skills are required
US Pay Range
The US annual base salary range for this full-time position is $165,000-$270,000 + benefits + 401(k) match + equity. The pay range is determined by the role, work location, job-related skills, level, experience, and relevant education. [Certain roles are eligible to earn sales commission, depending on the terms of the applicable plan.] The range displayed is the minimum and maximum target base salary and is applicable only for new hires for the listed position located in the US. Your Talent Advisor can share more details regarding salary ranges, benefits, and equity for your location during the hiring process.
US: We cover 100% of employee premiums and 88% of dependent(s) premiums for medical, dental and vision coverage, 401(k) match, short and long-term disability, life/AD&D insurance, $1,000/year education reimbursement, and a flexible vacation policy.
Outside the US: We offer a comprehensive benefits package which, (subject to regional variations) could include pension, private medical for you and dependents, generous holiday allowance, life assurance, long-term disability, annual wellbeing stipend
Your total compensation package will be based on job-related knowledge, education, certifications and location, per our aligned ranges.
Aviatrix, the pioneer of Secure Cloud Networking, optimizes business-critical application availability, performance, security, and cost with multicloud networking software that delivers a simplified and consistent enterprise-grade operational model in and across cloud service providers. Combined with the Aviatrix Certified Engineer (ACE) program, the industry’s first and only multicloud networking certification, innovative enterprises are transforming their business by upgrading their cloud networking with Aviatrix. Learn more at www.aviatrix.com.
WE WANT TO INCLUDE YOU
We embrace the fact that not everyone’s journey took the same route or started at the same place. If your experience doesn’t quite meet the requirements but the opportunity excites you and you believe you could be great, don’t let that hold you back from applying. Tell us what you CAN bring and what makes you special.
Aviatrix is a community where everyone's career can grow and we want to help you achieve your goals and be “your best YOU,” however that looks. If you're seeking an opportunity where you can be excited to start work every morning with enthusiastic people, make a real difference and be part of something amazing then let’s talk. We want to get to know you and how we could grow together.
Aviatrix, Inc. is an equal opportunity employer and does not make hiring decisions based on race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.