Operations Analyst
Hippocractic AI
Location
Palo Alto
Employment Type
Full time
Location Type
On-site
Department
Engineering
About Us
Hippocratic AI is the leading generative AI company in healthcare. We have the only system that can have safe, autonomous, clinical conversations with patients. We have trained our own LLMs as part of our Polaris constellation, resulting in a system with over 99.9% accuracy.
Why Join Our Team
Reinvent healthcare with AI that puts safety first. We’re building the world’s first healthcare‑only, safety‑focused LLM — a breakthrough platform designed to transform patient outcomes at a global scale. This is category creation.
Work with the people shaping the future. Hippocratic AI was co‑founded by CEO Munjal Shah and a team of physicians, hospital leaders, AI pioneers, and researchers from institutions like El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, Meta, Microsoft, and NVIDIA.
Backed by the world’s leading healthcare and AI investors. We recently raised a $126M Series C at a $3.5B valuation, led by Avenir Growth, bringing total funding to $404M with participation from CapitalG, General Catalyst, a16z, Kleiner Perkins, Premji Invest, UHS, Cincinnati Children’s, WellSpan Health, John Doerr, Rick Klausner, and others.
Build alongside the best in healthcare and AI. Join experts who’ve spent their careers improving care, advancing science, and building world‑changing technologies — ensuring our platform is powerful, trusted, and truly transformative.
Location Requirement
We believe the best ideas happen together. To support fast collaboration and a strong team culture, this role is expected to be in our Palo Alto office five days a week, unless otherwise specified.
About the Role
We are seeking a highly reliable and detail-oriented Operations Analyst to ensure the continuous, 24×7 operation of Hippocratic AI’s production systems, integrations, and customer/partner environments. This role is critical to minimizing customer and partner downtime, maintaining trust, and ensuring our AI agents and supporting systems operate smoothly at all times.
As an Operations Analyst, you will be responsible for monitoring system alerts, integrations, and operational reports; performing proactive maintenance; resolving common operational issues; and triaging advanced issues to the appropriate engineering, platform, or partner teams. You will play a central role in detecting issues early, coordinating incident response, and maintaining operational excellence across all customer and partner deployments.
You will work closely with engineering, infrastructure, security, customer support, and partner teams, and will help build the operational tooling, reporting, and automation needed to scale Hippocratic AI safely and reliably.
What You’ll Do
Integration Management & Development
Own the full integration lifecycle for major customers, from gathering requirements through design, development, testing, deployment, and ongoing support to deliver seamless connectivity between Hippocratic AI and client systems.
Operations Monitoring & Incident Response
Monitor all production systems, integrations, and automated alerts to ensure 24×7 continuous operations across customers and partners.
Serve as a first-line responder for operational alerts, diagnosing and resolving standard issues within defined SLAs.
Triage complex or advanced issues and page/engage the appropriate on-call engineers, platform teams, or partner contacts.
Coordinate incident response activities, track progress to resolution, and ensure clear internal handoffs during escalations.
Validate system recovery and perform post-incident checks to ensure full service restoration.
Proactive Maintenance & Reliability
Perform proactive system health checks, integration validations, and routine maintenance to prevent outages and degradation.
Identify trends in alerts, incidents, and performance metrics to recommend preventative actions and long-term fixes.
Help define and refine operational runbooks, escalation paths, and standard operating procedures (SOPs).
Participate in on-call rotations and support after-hours and weekend coverage as needed to maintain 24×7 availability.
Reporting, Automation & Tooling
Create and maintain operational reports and dashboards for internal teams, customers, and partners.
Build and maintain scripts and automation to monitor system health, validate integrations, and generate customer- or partner-specific reports.
Customize operational reporting for each customer/partner to meet contractual, SLA, and compliance requirements.
Continuously improve monitoring, alerting, and observability tooling to reduce noise and increase signal quality.
Cross-Functional Collaboration
Work closely with engineering, infrastructure, security, and customer support teams to resolve incidents and improve system resilience.
Support customer-facing teams by providing operational insights, incident summaries, and root-cause analysis.
Assist with onboarding new customers and partners by validating integrations, monitoring readiness, and ensuring operational coverage.
-
Contribute to post-incident reviews and continuous improvement initiatives to strengthen overall platform reliability.
What You Bring
Must Have:
Bachelor’s degree in Computer Science, Health Informatics, Information Systems, or a related field.
Bachelor’s degree in Information Systems, Computer Science, Operations, Engineering, or a related field (or equivalent practical experience).
3+ years of experience in operations, site reliability, NOC, technical support, or production monitoring roles.
Hands-on experience monitoring production systems, integrations, APIs, or data pipelines in a 24×7 environment.
Familiarity with alerting and monitoring tools (e.g., Datadog, New Relic, CloudWatch, Prometheus, Grafana, PagerDuty, Opsgenie, or similar).
Ability to troubleshoot common system, integration, and data-flow issues using logs, metrics, and dashboards.
Experience writing scripts or automation using tools/languages such as Python, Bash, SQL, or similar.
Strong understanding of incident management processes, escalation procedures, and SLA-driven operations.
Excellent organizational skills with the ability to manage multiple alerts, issues, and priorities simultaneously.
Clear written and verbal communication skills, especially during high-pressure incidents.
Strong sense of ownership, reliability, and attention to detail.
Nice to Have:
Experience supporting cloud-based platforms (AWS, Azure, or Google Cloud).
Familiarity with REST APIs, webhooks, message queues, or integration workflows.
Experience in healthcare, regulated environments, or HIPAA-compliant systems.
Exposure to CI/CD pipelines, deployment monitoring, or change management processes.
Experience creating customer-facing operational or SLA reports
Background in Site Reliability Engineering (SRE), DevOps, or production support for SaaS platforms.
-
Experience supporting AI/ML platforms, data pipelines, or real-time systems.
Please be aware of recruitment scams impersonating Hippocratic AI. All recruiting communication will come from @hippocraticai.com email addresses. We will never request payment or sensitive personal information during the hiring process.