Join our companies in their quest to drive powerful, positive, change that endures.

Machine Learning Engineer

Datavolo

Datavolo

Software Engineering
United States
Posted on Friday, August 30, 2024

About us:

Be the 10x Data Engineer!

Datavolo is passionate about helping data engineers provide highly valuable data to their companies' data driven products and today that means fueling AI applications! We’re building on Apache NiFi to accelerate the process of high quality data capture, transformation, and delivery to vector, relational, graph, object and other stores.

Join the Datavolo team! Help our customers easily harness unstructured and multi-modal data to fuel transformative Generative AI applications.

What you will do:

  • As a Datavolo Machine Learning Engineer (MLE), you'll be key part of a team dedicated to productionizing machine learning applications and systems at scale
  • You’ll participate in the detailed technical design, development, and implementation of machine learning applications using existing and emerging technology platforms
  • You’ll focus on machine learning architectural design, develop and review model and application code, and ensure high availability and performance of our machine learning applications
  • You'll have the opportunity to continuously learn and apply the latest innovations and best practices in machine learning engineering
  • The MLE role overlaps with many disciplines, such as Ops, Modeling, and Data Engineering
  • Design, build, and/or deliver ML models and components that solve real-world business problems
  • Evaluate ML infrastructure performance based on accuracy, precision, recall , and speed. Perform independent benchmarking of ML models against both internal and external datasets. Tuning models using hyperparameters and model training.
  • Solve complex problems by writing and testing application code, developing and validating ML models, and automating tests and deployment
  • Retrain, maintain, and monitor models in production
  • Leverage or build cloud-based architectures, technologies, and/or platforms to deliver optimized ML models at scale
  • Construct optimized data pipelines to feed ML models. Including knowledge of inference Pre and Post Processing steps such as Non-maximum Suppression (NMS) and Image normalization.
  • Leverage continuous integration and continuous deployment best practices, including test automation and monitoring, to ensure successful deployment of ML models and application code
  • Ensure all code is well-managed to reduce vulnerabilities, models are well-governed from a risk perspective, and the ML follows best practices in Responsible and Explainable AI

What you will have:

  • 5+ years of experience programming with Python
  • 3+ years of experience designing and building data-intensive solutions using distributed computing
  • 2+ years of on-the-job experience with an industry recognized ML frameworks (RAG, Generative AI, Unstructured data, scikit-learn, PyTorch, OpenCV, Numpy, or CUDA, Spark, or TensorFlow)
  • 1+ year of experience productionizing, monitoring, and maintaining models
  • Self starter with a proven track record of ownership
  • Experience with K8s a plus
  • Experience contributing to Open Source software, especially with Apache NiFi, a plus
  • Experience with Open Source AI communities, such as Kaggle, Huggingface, etc… a plus
  • BS (or higher) in Computer Science strongly preferred.