Data Scientist, Health System Optimization (Part-time Contract)
Sage Care
Job Title: Data Scientist, Health System Optimization (Part-time Contract)
Location: Hybrid, Palo Alto, CA Tuesday through Thursday
About Us:
At Sage Care, we’re transforming healthcare with AI-powered solutions to streamline care navigation for health systems. Our technology makes it easier for patients to find the right doctor, helps providers focus on the patients who need them most, and ensures faster access to care. Built by experts from Carbon Health, Apple, and Uber, our platform automates triage, enhances provider-patient matching, and maximizes appointment capacity—reducing wait times and improving overall efficiency. We're on a mission to make healthcare more accessible and efficient for everyone.
Role Overview:
As a Data Scientist on our team, you will play a pivotal role in transforming complex healthcare data into actionable insights that directly improve patient care and optimize health system efficiency. You will design, develop, and deploy advanced machine learning models and statistical analyses to uncover critical signals within a complex health system's data. You’ll collaborate closely with clinicians, engineers, and product stakeholders, acting as a bridge between business challenges and data-driven solutions. This role blends research-oriented investigation with hands-on implementation and requires deep domain knowledge in healthcare, strong programming skills, and a passionate commitment to improving patient care through data.
Contract Details:
This is a part-time contract position based in Palo Alto, CA. We're looking for an individual who can commit to approximately 8-10 hours per week for 8 weeks. This is a hybrid role, requiring in-office presence Tuesday through Thursday.
Key Responsibilities:
Strategic Data Analysis for System Optimization:
Lead the exploration and analysis of complex healthcare datasets (e.g., electronic health records, claims, operational metrics) to uncover patterns, bottlenecks, and opportunities for improving patient-doctor matching and optimizing health system throughput.
Develop and apply advanced analytical methods to extract predictive signals that inform product features and technology enhancements, driving greater efficiency and better patient outcomes.
Data Acquisition & Preparation:
Collect, clean, and preprocess diverse healthcare data sources, ensuring data quality, consistency, and strict adherence to HIPAA and privacy standards.
Design, implement, and maintain scalable ETL pipelines for seamless ingestion of data from internal databases and external APIs, with a strong focus on data integrity and security.
Machine Learning Model Development & Validation:
Build and deploy supervised and unsupervised learning models (e.g., regression, tree-based ensembles, deep learning) to address critical challenges such as patient flow optimization, resource utilization forecasting, and predictive patient segmentation to enhance matching accuracy.
Conduct thorough model validation—including cross-validation, hyperparameter tuning, and calibration—to ensure models perform robustly and generalize effectively in real-world clinical and operational environments.
Statistical Analysis & Insight Generation:
Perform deep exploratory data analysis (EDA) to identify trends, outliers, and biases in large-scale healthcare data, with a focus on factors impacting system efficiency and patient access.
Apply sophisticated statistical techniques (e.g., survival analysis, time-series modeling) to solve clinically and operationally relevant problems, directly influencing product improvements and technological innovation.
Cross-Functional Collaboration & Impact:
Collaborate closely with software engineers and DevOps teams to productionize models within cloud environments (AWS, GCP, Azure), ensuring scalability, reliability, and maintainability.
Partner with clinicians and healthcare domain experts to align modeling objectives with clinical workflows and validate the real-world applicability and impact on patient care and system throughput.
Present clear, compelling insights, visualizations, and actionable recommendations to both technical and non-technical stakeholders, translating complex data into strategic product and business decisions.
Model Monitoring, Maintenance & Continuous Improvement:
Develop and implement robust monitoring frameworks to track model performance, detect data drift, and automate retraining pipelines to maintain accuracy and relevance over time.
Continuously enhance models through iterative feedback from end users and new data sources, driving sustained optimization of patient matching and system efficiency.
Compliance & Documentation Excellence:
Ensure all data science processes strictly comply with regulatory frameworks (e.g., HIPAA, GDPR) and internal data governance policies, safeguarding patient privacy and data security.
Maintain detailed documentation of model designs, feature engineering, validation procedures, and ethical considerations to guarantee reproducibility, auditability, and transparency.
Required Qualifications:
Bachelor’s or Master’s degree in Data Science, Statistics, Computer Science, Biomedical Engineering, or a related quantitative field.
3+ years of professional experience in a data science or machine learning role, preferably within healthcare, biotechnology, or a similarly regulated environment.
Proficiency in Python (preferred) and/or R, with strong experience in libraries such as scikit-learn, TensorFlow/PyTorch, pandas, and NumPy.
Solid understanding of statistical modeling and machine learning algorithms (e.g., logistic regression, random forests, gradient boosting, neural networks).
Demonstrated experience with experiment design, A/B testing, and causal inference.
Experience with SQL and relational databases; familiarity with NoSQL solutions (e.g., MongoDB) is a plus.
Demonstrated ability to preprocess and analyze large, heterogeneous healthcare datasets, including structured EHR data and semi-structured clinical notes.
Hands-on experience with cloud platforms (AWS, GCP, or Azure) for data storage, computing, and model deployment.
Familiarity with data privacy regulations and best practices for handling Protected Health Information (PHI).
Excellent problem-solving skills, with the ability to translate ambiguous real-world problems into well-defined data science projects.
-
Strong communication skills and the ability to present technical results to cross-functional teams.
Preferred Qualifications:
Advanced degree (Ph.D. or MS) in a quantitative discipline with a research focus on healthcare or biomedical applications and/or at least 3+ years of relevant experience.
Experience developing and deploying deep learning models for medical imaging (e.g., using CNNs on radiology or pathology data).
Knowledge of natural language processing (NLP) techniques applied to clinical text (e.g., clinical named-entity recognition, transformer-based models).
Experience with search, recommendation, or retrieval-related modeling.
Familiarity with healthcare standards such as HL7, FHIR, SNOMED CT, and ICD-10.
Prior work with time-to-event data and survival analysis in a clinical research or biostatistics setting.
Experience with Docker, Kubernetes, and CI/CD pipelines for scalable model production.
Published research in peer-reviewed journals or conferences related to healthcare AI or biostatistics.
How to Apply
Please submit your resume, a brief cover letter highlighting relevant healthcare AI projects (e.g., publications, Kaggle competitions, or past product contributions), and links to any code repositories or portfolios.