Staff Engineer - Software Development
Aviatrix
Who We Are:
For enterprises struggling to secure cloud workloads, Aviatrix® offers a single solution for pervasive cloud security. Where current cybersecurity approaches focus on securing entry points to a trusted space, Aviatrix Cloud Native Security Fabric (CNSF) delivers runtime security and enforcement within the cloud application infrastructure itself – closing gaps between existing solutions and helping organizations regain visibility and control. Aviatrix ensures security, cloud, and networking teams are empowering developer velocity, AI, serverless, and what’s next. For more information, visit www.aviatrix.com.
About the Role - Staff Engineer – Software Development.
Our SaaS platform serves enterprise customers with a reliable, secure, and scalable product built on modern microservices, event-driven data pipelines, and increasingly, AI-powered capabilities.
We are seeking a Staff Engineer – Software Engineering to join our SaaS Platform team. Staff Engineers at Aviatrix operate at the intersection of technical depth and cross-team influence. You will drive the design and delivery of complex backend systems — including AI-augmented data pipelines and agentic AI integrations — while raising the technical bar across the organization. This role is ideal for an engineer who is deeply technical, thrives on solving ambiguous problems, and is ready to shape both the platform’s architecture and the team’s engineering culture.
Responsibilities
Architecture & Technical Leadership
- Lead the architecture and delivery of complex backend platform features spanning multiple microservices; break down large, ambiguous projects into well-scoped sub-projects and guide senior engineers through execution.
- Define technical standards, patterns, and best practices for backend microservices development across the team — covering API design, service communication, data modeling, and reliability.
- Own the architectural design of AI-augmented data pipelines: integrating model inference, agentic workflows, and LLM-based processing steps into streaming and batch data flows on AWS.
- Evaluate and recommend agentic AI frameworks and tooling (such as LangChain, LlamaIndex, Amazon Bedrock Agents, or equivalent) for use in platform components; establish integration patterns the broader team can follow.
- Write and present comprehensive design documents and technical proposals that articulate complex problems, solution trade-offs, and long-term platform direction.
- Drive improvements to Aviatrix’s product and security posture, including contributions outside your immediate project area.
AI-Augmented Data Pipelines
- Design and implement data pipelines that incorporate AI/ML model inference as a first-class processing step, enabling intelligent enrichment, classification, anomaly detection, and summarization of platform telemetry and events.
- Architect agentic AI workflows that operate autonomously within data pipelines — orchestrating tool use, retrieval-augmented generation (RAG), and multi-step reasoning to drive platform automation.
- Integrate large language models (LLMs) and foundation models via managed APIs (Amazon Bedrock, OpenAI, Anthropic) and self-hosted inference endpoints into backend services and pipeline stages.
- Build robust prompt engineering and context management layers that make AI model calls reliable, observable, cost-efficient, and safe in a production SaaS context.
- Establish guardrails, evaluation frameworks, and monitoring strategies for AI components in pipelines — including output validation, hallucination detection, latency tracking, and cost controls.
- Collaborate with data and ML teams to define feature stores, embedding pipelines, vector database integrations (e.g., Amazon OpenSearch, pgvector), and model lifecycle management practices.
Backend & Microservices Engineering
- Design and implement high-performance, production-grade microservices in Golang using gRPC and REST; ensure services are secure, observable, and operationally resilient.
- Drive the design of event-driven architectures using AWS data pipeline services — Kinesis, MSK (Kafka), Glue, Step Functions, and EMR — to support real-time and batch processing at scale.
- Enforce strong security and networking practices: AWS VPC architecture, IAM least-privilege access, PrivateLink for private service connectivity, and encryption in transit and at rest.
- Lead platform-wide reliability initiatives: SLO definition, distributed tracing, structured logging, alerting, and chaos engineering practices.
- Conduct rigorous design and code reviews; set a high bar for correctness, testability, and operational readiness.
Team Influence & Collaboration
- Serve as a technical anchor for the team — raising engineering quality through mentorship, design guidance, and a culture of rigorous technical thinking.
- Collaborate with product management and leadership to align technical roadmap investments with business priorities and customer needs.
- Partner with senior engineers across teams to drive consistency in platform architecture and avoid redundant solutions.
- Provide expert guidance during escalations and critical production incidents; drive root-cause analysis and long-term remediation.
- Contribute to recruiting and interviewing; help define bar-raising standards for backend and AI engineering roles.
Requirements
Experience & Education
- 7–10+ years of professional software engineering experience, with a strong track record of technical leadership on backend systems.
- Bachelor’s degree in Computer Science, Engineering, or a related technical field (or equivalent practical experience).
- Demonstrated experience designing and operating SaaS or PaaS products at scale, with multi-tenancy and high-availability requirements.
- Strong foundation in data structures, algorithms, operating systems, and distributed systems.
- Experience working in fast-paced, high-growth technology companies with geographically distributed teams.
Backend & Microservices
- Strong proficiency in Golang; experience building production-grade gRPC-based microservices.
- Deep understanding of distributed systems: consistency, fault tolerance, service discovery, back-pressure, and distributed tracing.
- Expertise in RESTful and gRPC API design, with a strong sense of API contracts, versioning, and backward compatibility.
- Experience building event-driven architectures with message queue and streaming systems (Kafka/MSK, Kinesis, SQS/SNS).
- Strong understanding of elegant and robust system design; ability to evaluate and articulate key engineering trade-offs.
Agentic AI & LLM Integration
- Hands-on experience integrating large language models (LLMs) and foundation models into production backend systems via managed APIs or self-hosted inference.
- Practical knowledge of agentic AI concepts: tool use, autonomous multi-step reasoning, agent orchestration, and human-in-the-loop patterns.
- Experience with at least one agentic AI framework or platform (e.g., LangChain, LlamaIndex, Amazon Bedrock Agents, AutoGen, or similar).
- Familiarity with retrieval-augmented generation (RAG): embedding models, vector databases (OpenSearch, pgvector, Pinecone), and context retrieval strategies.
- Understanding of prompt engineering principles: structured prompts, few-shot examples, chain-of-thought reasoning, and output validation.
- Experience designing AI guardrails and evaluation pipelines to ensure reliability, safety, and cost-efficiency of LLM components in production.
Data Pipelines
- Proven experience designing and operating large-scale data pipelines for real-time streaming and batch workloads.
- Strong working knowledge of AWS data pipeline services: Amazon Kinesis (Data Streams & Firehose), AWS Glue, Amazon MSK, AWS Step Functions, and Amazon EMR.
- Ability to embed AI/ML inference steps — including LLM calls, embedding generation, and model scoring — as processing stages within data pipelines.
- Familiarity with pipeline observability: data lineage, quality monitoring, SLA management, and alerting.
- Experience with schema management, data quality enforcement, and data catalog practices (e.g., AWS Glue Data Catalog).
AWS & Cloud Infrastructure
- Deep experience with AWS as a primary cloud platform; strong understanding of high-availability, multi-AZ deployment patterns.
- Solid knowledge of AWS networking and security: VPC architecture, IAM roles and policies, AWS PrivateLink, Transit Gateway, and security groups.
- Proficiency with Kubernetes (EKS), including workload scheduling, RBAC, networking (CNI), and cluster operations.
- Experience with infrastructure-as-code (Terraform, CDK, or CloudFormation) and CI/CD pipelines for cloud-native backend services.
- Familiarity with Amazon Bedrock or equivalent managed AI/ML platforms for model deployment and inference at scale.
Nice to Have
- Experience fine-tuning or adapting foundation models for domain-specific tasks.
- Familiarity with ML frameworks (PyTorch, HuggingFace Transformers) and model serving infrastructure (SageMaker, TorchServe).
- Background in network security, cloud networking, or SASE/SD-WAN technologies.
- Experience with multi-cloud environments (Azure, GCP) alongside AWS.
- Contributions to open-source AI, data engineering, or cloud-native projects.
Interpersonal & Communication
- Excellent written and verbal communication; able to produce clear technical documents and present confidently to engineering and product leadership.
- Demonstrated ability to lead through influence, build consensus across teams, and drive outcomes without direct authority.
- Team player who thrives in collaborative, geographically distributed environments and startup-paced cultures.
BENEFITS
US: We cover 100% of employee premiums and 88% of dependent(s) premiums for medical, dental and vision coverage, 401(k) match, short and long-term disability, life/AD&D insurance, $1,000/year education reimbursement, and a flexible vacation policy.
Outside the US: We offer a comprehensive benefits package which, (subject to regional variations) could include pension, private medical for you and dependents, generous holiday allowance, life assurance, long-term disability, annual wellbeing stipend
Your total compensation package will be based on job-related knowledge, education, certifications and location, per our aligned ranges.
About Aviatrix
Aviatrix® is the cloud network security company trusted by more than 500 of the world’s leading enterprises. As cloud infrastructures become more complex and costly, the Aviatrix Cloud Network Security platform gives companies back the power, control, security, and simplicity they need to modernize their cloud strategies. Aviatrix is the only secure networking solution built specifically for the cloud, that ensures companies are ready for AI and what’s next. Combined with the Aviatrix Certified Engineer (ACE) Program, the industry’s leading secure multicloud networking certification, Aviatrix unifies cloud, networking, and security teams and unlocks greater potential across any cloud.
WE WANT TO INCLUDE YOU
We embrace the fact that not everyone’s journey took the same route or started at the same place. If your experience doesn’t quite meet the requirements but the opportunity excites you and you believe you could be great, don’t let that hold you back from applying. Tell us what you CAN bring and what makes you special.
Aviatrix is a community where everyone's career can grow and we want to help you achieve your goals and be “your best YOU,” however that looks. If you're seeking an opportunity where you can be excited to start work every morning with enthusiastic people, make a real difference and be part of something amazing then let’s talk. We want to get to know you and how we could grow together.
Aviatrix, Inc. is an equal opportunity employer and does not make hiring decisions based on race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.
CPRA - California Applicant Privacy Notice