My job alerts

Researcher: Audio (Data)

Cartesia

San Francisco, CA, USA

Posted 6+ months ago

Apply now

About Cartesia

Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.

We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.

We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.

The Role

• Lead the design and creation of high-quality datasets tailored for training cutting-edge audio models, focusing on tasks such as speech recognition, enhancement, separation, synthesis, and speech-to-speech systems.

• Develop strategies for curating, augmenting, and labeling audio datasets to address challenges like noise, variability, and diverse use cases.

• Design innovative data augmentation and synthetic data generation techniques to enrich training datasets and improve model robustness.

• Create datasets specifically for speech-to-speech systems, focusing on alignment, phonetic variability, and cross-linguistic considerations.

• Collaborate closely with researchers and engineers to understand model requirements and ensure datasets are optimized for specific architecture and task needs.

• Build tools and pipelines for scalable data processing, labeling, and validation to support both research and production workflows.

What We’re Looking For

• Deep expertise in audio data processing, with a strong understanding of the challenges involved in creating datasets for tasks like ASR, TTS, or speech-to-speech modeling.

• Experience with audio processing libraries and tools, such as librosa, torchaudio, or custom pipelines for large-scale audio data handling.

• Familiarity with data augmentation techniques for audio, including time-stretching, pitch-shifting, noise addition, and domain-specific methods.

• Strong understanding of dataset quality metrics and techniques to ensure data sufficiency, coverage, and relevance to target tasks.

• Programming skills in Python and experience with frameworks like PyTorch or TensorFlow for integrating data pipelines with model training workflows.

• Comfortable with large-scale data processing, distributed file systems for audio data storage and processing.

• A collaborative mindset, with the ability to work closely with researchers and engineers to align data design with model objectives.

Nice-to-Haves

• Experience in creating synthetic datasets using generative models or simulation frameworks.

• Background in multimodal data curation, integrating audio with text, video, or other modalities.

• Early-stage startup experience or experience building datasets for cutting-edge research.

Our culture

🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.

🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.

🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.

Our perks

🍽 Lunch, dinner and snacks at the office.

🏥 Fully covered medical, dental, and vision insurance for employees.

🏦 401(k).

✈️ Relocation and immigration support.

🦖 Your own personal Yoshi.

Apply now

See more open positions at Cartesia

Privacy policy Cookie policy

Stay Up to Date

Thanks!

Researcher: Audio (Data)

About Cartesia