AI Infrastructure Engineer
Percepta
Location
New York City
Employment Type
Full time
Location Type
On-site
Department
Engineering
Compensation
- $180K – $300K • Offers Equity
Who we are
Percepta's mission is to transform critical institutions with applied AI. We care that industries that power the world (healthcare, manufacturing, energy) benefit from frontier technology.
We collaborate with industry-leading customers to drive AI transformation. We bring:
Forward-deployed expertise in engineering, product, and research
Mosaic, our in-house toolkit for rapidly deploying agentic architectures
Strategic partnerships with Anthropic, McKinsey, AWS, and the General Catalyst portfolio
Our team is a fast-growing group of Applied AI Engineers, Embedded Product Managers, and Researchers motivated by getting frontier AI into the places that actually run the world.
Percepta is a direct partnership with General Catalyst.
About the role
We're hiring an AI Infrastructure Engineer to own the infrastructure, deployment, and operational reliability that powers Percepta's AI systems, including the autonomous agents at the core of what we ship.
Part of the work is hardening what exists: tightening our Terraform footprint, strengthening deployment pipelines, bringing more rigor to how we manage infrastructure across regions and providers. Part of it is building what's missing. And part of it is genuinely new territory, figuring out what SRE means when the systems you're operating make autonomous decisions.
The infrastructure patterns for the agentic systems of the future don't exist yet. You'll help define them.
Why this is different
You're deploying autonomous systems. The infrastructure contract changes when your workloads have agency.
Observability means understanding why an agent made a decision, not just whether a pod is healthy.
The gap between research and production is real here. Our teams move optimization algorithms and AI systems from research environments into production, and you'll be part of that handoff. MLOps experience isn't required, but you'll be closer to that boundary than most infra roles.
-
Small team. Real ownership. You're making foundational decisions, not inheriting someone else's.
What you'll do
Define infrastructure patterns for multi-agent systems that need to be observable, controllable, and recoverable in ways traditional apps don't require
Own and evolve our IaC stack: Terraform and Kubernetes across AWS, GCP, and Azure
Build observability primitives for agentic workflows, tracing agent decisions and execution paths, not just service latency and pod health
Design and maintain CI/CD pipelines that give teams fast, trustworthy feedback from commit to production
Build operational foundations: monitoring, alerting, incident response, and the new patterns that emerge when AI systems are participants in that response
-
Work across engineering teams to meet the reliability and compliance requirements of the institutions we serve (SOC 2, HIPAA, regulated environments in healthcare and energy)
What we're looking for
5+ years building and operating production infrastructure in DevOps or SRE roles
The kind of engineer who sees a manual process and can't rest until it's automated well, not just scripted
Strong hands-on Terraform experience
Deep experience with at least 1 major cloud provider (AWS, GCP, or Azure): networking, IAM, cost management, the operational realities of production workloads
Solid Docker and Kubernetes experience in production. We run managed clusters across all 3 major clouds; this is a core part of the role
Experience designing and maintaining CI/CD pipelines (GitHub Actions, GitLab CI, or similar)
Scripting proficiency in Python, Bash, or similar
High agency: you don't wait for a ticket to fix what's broken, but you communicate, collaborate, and bring the team along
Genuine curiosity about AI systems, not just the infrastructure running them. You want to understand what you're operating
-
You find it interesting (not alarming) that some systems you'll operate will be making decisions on their own
Nice to have
Multi-region and multi-cloud experience across 2+ providers
Experience with single-tenant or on-prem deployments alongside multi-tenant SaaS
Familiarity with GitOps patterns and progressive delivery
Familiarity with the Grafana stack (Prometheus, Grafana, Loki) or equivalent
Experience with compliance frameworks (HIPAA, SOC 2) and how they shape infrastructure decisions in regulated environments
Background supporting ML or research workflows moving to production: model deployment, pipeline orchestration, or similar
You've thought about what observability means for non-deterministic systems and have opinions about it
The infrastructure patterns for autonomous AI systems are still being written. If you want to be one of the people writing them, let's talk.
Our Values
Dream bigger: We have the unique privilege of taking on the most ambitious problems and we should chase them with optimism, responsibility, and genuine belief that we can make it happen. We have to embrace the hard things when no one else will.
Heart in the game: What we're doing matters and we have to give a shit. Internally, that means fixing badness when you find it. Externally, it means honoring the trust our customers place in us with their most important problems. This isn’t a 9-5, nor is it a job we’re ever going to monitor your hours. We promise to put work in front of you that matters and in return, we ask you to promise to care.
Win for the customer: Everyone is an engineer and the job of an engineer is to deliver outcomes, not outputs. Everything we do—the products we build, the partnerships we launch, the strategy we set—exists to make our customers successful. Delivery is the strategy.
Make the call: Organizations are only as strong as the pace at which they make decisions. Everyone at Percepta should feel empowered to commit and shape the ambiguity in front of them. But "make the call" cuts both ways: make the decision and make the phone call. High-agency decision-making only works with high-bandwidth communication and we commit to never operate in silos.
Intensity with kindness: We believe in excellence in execution, candor in feedback, ruthlessness in prioritization, and survivalist urgency. We also believe you don't need to be an asshole to deliver on any of this. The trust built through shared kindness and vulnerability is what makes the intensity sustainable.
Compensation Range: $180K - $300K