Join our companies in their quest to drive powerful, positive, change that endures.

Lead ML-Data Linguist



Software Engineering, Data Science
Bengaluru, Karnataka, India
Posted on Wednesday, January 24, 2024
Sanas is revolutionizing the way we communicate with the world’s first real-time algorithm, designed to modulate accents, eliminate background noises, and magnify speech clarity. Pioneered by seasoned startup founders with a proven track record of creating and steering multiple unicorn companies, our groundbreaking GDP-shifting technology sets a gold standard. Our initial deployment is laser-focused on elevating the standards of customer experience centers. Testimonials from our partners reveal staggering double-digit improvements in mission-critical KPIs, coupled with boosts in CSAT and NPS. More than just a tool, our technology champions a bias-free workspace. This not only fosters a positive work environment but has also been instrumental in reducing employee attrition and curbing training expenditures.

Sanas is a 70-strong team, established in 2020. In this short span, we’ve successfully secured over $50 million in funding. Our innovation have been supported by the industry’s leading investors, including Insight Partners, Google Ventures, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you’re not just adopting a product; you’re investing in the future of communication.

Sanas is seeking a detail-oriented and self-motivated individual to join ML Analytics Org as a Lead ML-Data Linguist (ML-DL). In this role, you will be responsible for evaluating, labeling and annotating speech data, handling data annotation requests from stakeholders, developing annotation guidelines for different use-cases and tracking and reporting quality metrics. You will work closely with the Linguists, Scientists and Engineers in the team to provide high-quality labeled data for training and testing Sanas speech models.

Key Responsibilities

  • Evaluate synthetic speech data for naturalness and intelligibility, and identify issues, errors and inconsistencies in accent translation
  • Label and annotate speech data with annotation conventions, such as phoneme mismatch, robotic sound, and hallucination
  • Handle data annotation and analysis requests from multiple stakeholders in a fast-paced environment. Prioritize and deliver work based on business needs
  • Develop and manage annotation guidelines for different data modalities in collaboration with customers and stake holders
  • Drive testing and onboarding to latest annotation tools
  • Track and report quality metrics, meeting all key performance indicators (KPIs) and service level agreements (SLAs) with customers and stakeholders
  • Maximize productivity, efficiency and quality through streamlined workflows, process standardization, documentation, audits, and investigations
  • Communicate effectively with team members to resolve issues and provide feedback on the annotation process
  • Onboard and train junior ML-DLs regularly

Basic Qualifications

  • A bachelor’s degree in Linguistics or a related field
  • Native or near-native fluency in English with an understanding of accentual nuances
  • Strong attention to detail and ability to maintain focus for extended periods
  • Excellent listening, comprehension, writing and presentation skills
  • A minimum of two years of experience in creating and managing annotation conventions for different data modalities
  • A minimum of two years of experience in creating and tracking quality and throughput specific metrics for different annotation conventions
  • A minimum of one year of experience in mentoring junior ML-DLs

Preferred qualifications

  • A master’s degree in linguistics
  • Demonstrated experience of handling multiple priorities simultaneously
  • Previous experience in working with speech data

ML Analytics Organization at SANAS AI

  • SANAS AI's ML Analytics Organization is a dynamic hub of innovation, dedicated to reshaping analytics in speech technology. Comprising diverse professionals including seasoned Applied Scientists, Speech Scientists, Computational Linguists, Data Linguists, and Software Developers, our team pioneers advanced evaluation metrics and frameworks for speech models. We empower large-scale speech evaluations with a self-serve platform. Additionally, we manage a robust data annotation platform catering to diverse data modalities internally. At the forefront of science initiatives, we drive accuracy, automation, and efficiency, ensuring objective and exhaustive evaluations of speech quality. Our commitment to advancing the field positions us as leaders in revolutionizing the landscape of speech technology

A Day in the Life of Soham, Lead ML-DL at ML Analytics Org

  • Soham starts their day by analyzing synthetic speech data from the new Speech Model for naturalness and intelligibility, adeptly identifying nuances and discrepancies. Juggling diverse stakeholder requests, Soham thrives in a fast-paced environment, prioritizing annotation tasks based on business needs. Collaborating seamlessly with the Scientists and Engineers in the team, Soham crafts annotation guidelines tailored for various data modalities, ensuring alignment with customer expectations
  • Beyond annotation, Soham delves into the realm of qualitative and quantitative metrics, tracking and reporting KPIs and SLAs. Soham’s commitment to efficiency echoes through their consistent effort to streamline evaluation workflows, standardize processes, and proactive communication with the team members to resolve issues. As a mentor, Soham regularly onboards and trains budding ML-Data Linguists, fostering a culture of continuous learning. In this dynamic role, Soham plays a pivotal part in shaping our success through collaboration and dedication to excellence