Join our companies in their quest to drive powerful, positive, change that endures.

ML-Data Linguist

Sanas

Sanas

Software Engineering, Data Science
Bengaluru, Karnataka, India
Posted on Wednesday, January 24, 2024
Sanas is revolutionizing the way we communicate with the world’s first real-time algorithm, designed to modulate accents, eliminate background noises, and magnify speech clarity. Pioneered by seasoned startup founders with a proven track record of creating and steering multiple unicorn companies, our groundbreaking GDP-shifting technology sets a gold standard. Our initial deployment is laser-focused on elevating the standards of customer experience centers. Testimonials from our partners reveal staggering double-digit improvements in mission-critical KPIs, coupled with boosts in CSAT and NPS. More than just a tool, our technology champions a bias-free workspace. This not only fosters a positive work environment but has also been instrumental in reducing employee attrition and curbing training expenditures.

Sanas is a 70-strong team, established in 2020. In this short span, we’ve successfully secured over $50 million in funding. Our innovation have been supported by the industry’s leading investors, including Insight Partners, Google Ventures, General Catalyst, Quiet Capital, and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas, you’re not just adopting a product; you’re investing in the future of communication.

Sanas is seeking a detail-oriented and self-motivated individual to join our team as a ML-Data Linguist (ML-DL). In this role, you will be responsible for evaluating, labeling and annotating speech data, handling data annotation requests from stakeholders in close collaboration with other ML-DLs, revisiting annotation guidelines for different use-cases and tracking and reporting individual and group quality metrics on a regular cadence. You will work closely with the Lead ML-DLs, Linguists, Scientists and Engineers in the team to provide high-quality labeled data for training and testing Sanas speech models.

Key Responsibilities

  • Evaluate synthetic speech data for naturalness and intelligibility, and identify issues, errors and inconsistencies in accent translation
  • Follow established annotation conventions and quality assurance procedures to ensure the accuracy and consistency of annotated data
  • Label and annotate speech data with established annotation conventions, such as phoneme mismatch, robotic sound, and hallucination
  • Work closely with the Lead ML-DL to priorities business sensitive tasks and communicate effectively with team members to resolve issues and provide feedback on the annotation convention and process


Basic Qualifications

  • A bachelor’s degree in Linguistics or a related field
  • Native or near-native fluency in English with an understanding of accentual nuances
  • Strong attention to detail and ability to maintain focus and interest for extended periods while doing monotonous process-based tasks on a daily basis
  • Excellent listening, comprehension, writing and presentation skills
  • Ability to follow guidelines with utmost care and complete time bound tasks with efficiency
  • Comfortable working with technology and learning new tools and software
  • Track and report quality metrics, meeting all key performance indicators (KPIs) and service level agreements (SLAs)


Preferred qualifications

  • A master’s degree in linguistics
  • Demonstrated experience of handling multiple priorities simultaneously
  • Previous experience in working with speech data


ML Analytics Organization at SANAS AI

  • SANAS AI's ML Analytics Organization is a dynamic hub of innovation, dedicated to reshaping analytics in speech technology. Comprising diverse professionals including seasoned Applied Scientists, Speech Scientists, Computational Linguists, Data Linguists, and Software Developers, our team pioneers advanced evaluation metrics and frameworks for speech models. We empower large-scale speech evaluations with a self-serve platform. Additionally, we manage a robust data annotation platform catering to diverse data modalities internally. At the forefront of science initiatives, we drive accuracy, automation, and efficiency, ensuring objective and exhaustive evaluations of speech quality. Our commitment to advancing the field positions us as leaders in revolutionizing the landscape of speech technology


A Day in the Life of Suparn, ML-DL at ML Analytics Org

  • Suparn starts each day by scrutinizing synthetic speech data from the new Speech Model, assessing naturalness and intelligibility while keenly pinpointing nuances and discrepancies. Navigating diverse stakeholder requests in a dynamic operational setting, Suparn excels in prioritizing annotation tasks based on business imperatives, collaborating closely with Lead ML-DLs. Suparn actively contributes insights to customize annotation guidelines for various data modalities, ensuring adherence to customer expectations
  • Beyond annotation, Suparn delves into qualitative and quantitative metrics, tracking and reporting individual and group KPIs and SLAs. Demonstrating unwavering commitment to efficiency, Suparn consistently streamlines evaluation workflows, standardizes processes, and engages in proactive communication with team members to promptly resolve issues. Serving as a mentor, Suparn regularly onboards and trains emerging ML-Data Linguists, cultivating a culture of perpetual learning. In this dynamic role, Suparn contributes to our success through operational excellence and collaborative endeavors