Principal Data Engineer
Tendo
Data Science
San Francisco, CA, USA
USD 157,250-212,750 / year + Equity
Responsibilities
- Collaborate with Data Scientists and Business Intelligence Analysts to ensure efficient and effective data processing and analysis.
- Optimize data infrastructure and processes to ensure optimal performance and scalability.
- Develop and maintain data documentation and data lineage.
- Stay current with emerging technologies and industry trends related to data engineering.
Requirements
- 7+ years of experience in data engineering.
- Extensive experience in the design, build, and maintenance of data ETL pipelines.
- Extensive knowledge of coding in Python or Scala with a focus on data processing.
- Experience using Apache Spark (PySpark or Scala).
- Experience with AWS technology stack (S3, Glue, Athena, EMR, etc.).
- Experience with data and entity relationship modeling to support data warehouses and analytics solutions.
- Deep understanding of relational and non-relational databases (SQL/NOSQL).
- Comfortable working with unstructured and semi-structured data (Web scraping).
- Experience working in a professional software environment using source control (git), an issue tracker (JIRA, Confluence, etc.), continuous integration, code reviews, and agile development process (Scrum/Lean).
- Basic data privacy and security principles.
- Interest and/or experience in AI/ML applications, including support for model development or deployment workflows.
- Proactive mindset around exploring emerging technologies in AI and data science to drive innovation.
Nice to Have
- Knowledge of, or experience with, healthcare data standards such as HL7, FHIR, ICD, SNOMED, LOINC.
- Experience with Delta Lake and/or Databricks.
- Hands-on experience with machine learning workflows, including preparing data for AI model training and evaluation.
- Experience with machine learning workflows and data requirements for use with ML frameworks.
- Experience validating data quality, preferably with test automation.
- Experience with containerization using Docker.