Senior Site Reliability Engineer

Hippocractic AI

Hippocractic AI

Software Engineering
Palo Alto, CA, USA
Posted on Nov 21, 2024

About Us:

Hippocratic AI is developing the first safety-focused Large Language Model (LLM) for healthcare. Our mission is to dramatically improve healthcare accessibility and outcomes by bringing deep healthcare expertise to every person. No other technology has the potential for this level of global impact on health.

Why Join Our Team:

  • Innovative mission: We are creating a safe, healthcare-focused LLM that can transform health outcomes on a global scale.

  • Visionary leadership: Hippocratic AI was co-founded by CEO Munjal Shah alongside physicians, hospital administrators, healthcare professionals, and AI researchers from top institutions including El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, Meta, Microsoft and NVIDIA.

  • Strategic investors: Raised $137 million from top investors including General Catalyst, Andreessen Horowitz, Premji Invest, SV Angel, NVentures (Nvidia Venture Capital), and Greycroft.

  • Team and expertise: We are working with top experts in healthcare and artificial intelligence to ensure the safety and efficacy of our technology.

For more information, visit www.HippocraticAI.com.

We value in-person teamwork and believe the best ideas happen together. Our team is expected to be in the office five days a week in Palo Alto, CA unless explicitly noted otherwise in the job description

About the role:

We are seeking a highly skilled Senior Site Reliability Engineer to join our team. In this role responsibilities will include designing and implementing infrastructure automation, continuous integration and delivery pipelines, and monitoring and scaling the infrastructure that powers our healthcare AI platform. You will work closely with software engineers, research scientists, and other cross-functional teams to develop and maintain reliable and scalable infrastructure that enables rapid iteration and deployment of our products.

Key Responsibilities:

  • Design and implement infrastructure automation and deployment pipelines using tools such as Terraform, Ansible, and Jenkins

  • Implement and maintain monitoring and logging systems to ensure the reliability and performance of our healthcare AI platform

  • Work closely with software engineers to design and deploy scalable, fault-tolerant, and secure production systems on cloud platforms such as AWS, GCP, or Azure

  • Develop and maintain security and compliance policies and procedures for our healthcare AI platform

  • Collaborate with cross-functional teams to troubleshoot and resolve complex issues related to infrastructure, deployment, and operations

  • Implement and maintain disaster recovery and business continuity plans

  • Develop and maintain documentation related to infrastructure, deployment, and operations

  • Mentor and provide technical guidance to junior engineers

Qualifications:

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field

  • At least 5 years of professional experience in DevOps engineering or a related field

  • Expertise in infrastructure automation and deployment tools such as Terraform, Ansible, Jenkins, or GitLab CI/CD

  • Experience with cloud platforms such as AWS, GCP, or Azure

  • Strong knowledge of containerization technologies such as Docker and Kubernetes

  • Experience with monitoring and logging tools such as ELK, Grafana, or Datadog

  • Familiarity with security and compliance best practices and tools such as HashiCorp Vault, AWS KMS, or Azure Key Vault

  • Strong problem-solving skills and ability to work independently and collaboratively in a team environment

  • Excellent communication and interpersonal skills

  • Experience implementing HIPAA and SOC2 compliance in a plus

  • Experience working in an HPC Environment is a plus