Senior Applied Machine Learning Scientist
Sanas
Software Engineering
Europe
Posted on Feb 13, 2025
Sanas is revolutionizing the way we communicate with the world’s first real-time algorithm, designed to modulate accents, eliminate background noises, and magnify speech clarity. Pioneered by seasoned startup founders with a proven track record of creating and steering multiple unicorn companies, our groundbreaking GDP-shifting technology sets a gold standard.
Sanas is a 200-strong team, established in 2020. In this short span, we’ve successfully secured over $100 million in funding. Our innovation have been supported by the industry’s leading investors, including Insight Partners, Google Ventures, Quadrille Capital, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you’re not just adopting a product; you’re investing in the future of communication.
We are seeking a Senior Applied Machine Learning Scientist with deep expertise in foundational modeling and large-scale speech AI systems. In this role, you will lead the development of advanced models that push the boundaries of speech processing, including self-supervised learning, large-scale pretraining, and multimodal architectures. Your focus will be on scaling models efficiently while ensuring real-time performance, robustness, and adaptability to diverse environments.
This position requires a strong foundation in ML techniques, an innovative mindset, and a deep commitment to continuous improvement of deployed systems.
Key Responsibilities:
- Architect, train, and optimize large-scale speech AI models, including speech-to-speech, speech restoration, and speech translation.
- Leverage self-supervised learning, contrastive learning, and transformer-based architectures (e.g., wav2vec, Whisper, GPT-style models) to improve model accuracy and adaptability.
- Develop efficient model distillation and quantization strategies to deploy large models with low-latency inference.
- Innovate on cross-lingual and multilingual speech processing using large-scale pretraining and fine-tuning.
- Curate and scale massive diverse, multilingual, and multimodal datasets for robust model training.
- Apply active learning, domain adaptation, and synthetic data generation to overcome data limitations.
- Lead efforts in data quality assessment, augmentation, and curation for large-scale training pipelines.
- Develop distributed training strategies for large-scale models using cloud-based and on-prem GPU clusters.
- Design and implement scalable model evaluation frameworks, tracking WER, MOS, and latency across diverse scenarios.
- Optimize real-time inference pipelines to ensure high-throughput, low-latency speech processing.
- Stay ahead of advancements in foundational models, generative AI, and large-scale speech modeling.
- Collaborate with academia, open-source communities, and research partners to drive innovation.
- Work closely with MLOps, Data Engineering, and Product teams to deploy scalable AI systems.
- Ensure seamless integration of foundational models with edge devices, real-time applications, and cloud platforms.
- Translate cutting-edge research into production-grade models that power real-world communication.
Must have qualifications:
- Bachelor’s, Master’s or Ph.D. in Computer Science, Electrical Engineering, or a related field with a focus on Machine Learning, Deep Learning, or Speech Processing.
- 5+ years of hands-on industry experience in developing and implementing the following systems:
- Speech-to-text (ASR)
- Text-to-speech (TTS)
- Voice conversion & speech enhancement
- Speech translation & multimodal learning
- Strong proficiency in transformer-based architectures (e.g., wav2vec 2.0, Whisper, GPT, BERT).
- Expertise in deep learning frameworks such as PyTorch, TensorFlow, and large-scale training techniques.
- Experience with distributed training and optimization across multi-GPU clusters.
- Strong understanding of self-supervised learning, contrastive learning, and generative modeling for speech AI.
- Hands-on experience with cloud-based AI platforms (AWS, GCP, Azure) and model deployment.
Preferred experience:
- Experience in developing multimodal AI models integrating speech, text, and vision.
- Track record of publishing in top-tier AI/ML conferences.
- Experience optimizing large models for real-time inference on edge devices.
- Proficiency with MLOps best practices for deploying and monitoring models in production.
- Familiarity with open-source ASR/TTS toolkits.