Senior Site Reliability Engineer
Hippocractic AI
About Us:
Hippocratic AI is developing the first safety-focused Large Language Model (LLM) for healthcare. Our mission is to dramatically improve healthcare accessibility and outcomes by bringing deep healthcare expertise to every person. No other technology has the potential for this level of global impact on health.
Why Join Our Team:
Innovative mission: We are creating a safe, healthcare-focused LLM that can transform health outcomes on a global scale.
Visionary leadership: Hippocratic AI was co-founded by CEO Munjal Shah alongside physicians, hospital administrators, healthcare professionals, and AI researchers from top institutions including El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, Meta, Microsoft and NVIDIA.
Strategic investors: Raised $137 million from top investors including General Catalyst, Andreessen Horowitz, Premji Invest, SV Angel, NVentures (Nvidia Venture Capital), and Greycroft.
Team and expertise: We are working with top experts in healthcare and artificial intelligence to ensure the safety and efficacy of our technology.
For more information, visit www.HippocraticAI.com.
We value in-person teamwork and believe the best ideas happen together. Our team is expected to be in the office five days a week in Palo Alto, CA unless explicitly noted otherwise in the job description
About the role:
We are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery pipelines, and monitoring and scaling the infrastructure that powers our healthcare AI platform. You will work closely with software engineers, research scientists, and other cross-functional teams to develop and maintain reliable and scalable infrastructure that enables rapid iteration and deployment of our products.
Key Responsibilities:
Design and implement infrastructure automation and deployment pipelines using tools such as Terraform, Ansible, and Jenkins
Implement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platform
Work closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or Azure
Develop and maintain security and compliance policies and procedures for our healthcare AI platform
Collaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operations
Implement and maintain disaster recovery and business continuity plans
Develop and maintain documentation related to infrastructure, deployment, and operations
-
Mentor and provide technical guidance to junior engineers
Qualifications:
Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field
At least 5 years of professional experience in DevOps engineering or a related field
Expertise in infrastructure automation and deployment tools such as Terraform, Ansible, Jenkins, or GitLab CI/CD
Experience with cloud platforms such as AWS, GCP, or Azure
Strong knowledge of containerization technologies such as Docker and Kubernetes
Experience with monitoring and logging tools such as ELK, Grafana, or Datadog
Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault
Strong problem-solving skills and ability to work independently and collaboratively in a team environment
Excellent communication and interpersonal skills
Experience implementing HIPAA and SOC2 compliance in a plus
Experience working in an HPC Environment is a plus