Synthetic Data Specialist
Cartesia
Location
HQ - San Francisco, CA
Employment Type
Full time
Location Type
On-site
Department
Staff
Compensation
- $180K – $250K • Offers Equity
About Cartesia
Our mission is to build the next generation of AI: ubiquitous, interactive intelligence that runs wherever you are. Today, not even the best models can continuously process and reason over a year-long stream of audio, video and text—1B text tokens, 10B audio tokens and 1T video tokens—let alone do this on-device.
We're pioneering the model architectures that will make this possible. Our founding team met as PhDs at the Stanford AI Lab, where we invented State Space Models or SSMs, a new primitive for training efficient, large-scale foundation models. Our team combines deep expertise in model innovation and systems engineering paired with a design-minded product engineering team to build and ship cutting edge models and experiences.
We're funded by leading investors at Index Ventures and Lightspeed Venture Partners, along with Factory, Conviction, A Star, General Catalyst, SV Angel, Databricks and others. We're fortunate to have the support of many amazing advisors, and 90+ angels across many industries, including the world's foremost experts in AI.
The Role
The future of AI training will be built on a foundation of high-quality synthetic data. We are looking for a creative and resourceful Synthetic Data Specialist to design and build the systems that generate training data at an unprecedented scale. This is a unique, high-impact role, where you will solve critical data bottlenecks and directly accelerate our research progress.
What you’ll do
Evaluate fidelity, diversity, and usefulness of synthetic data across LLMs, audio generation, and audio understanding.
Implement techniques for steering data generation to improve model intelligence through data and mitigate bias.
Build automated quality control systems to validate and filter generated data
Design synthetic datasets at large scale to develop model capabilities.
Stay on the cutting edge of research in synthetic data generation, data augmentation, and generative models.
What we’re looking for
Experience with generative models (speech, text, or multimodal).
Strong applied ML background with a focus on data-centric approaches.
Understanding of evaluation methods for synthetic data quality.
Excitement for building scalable systems that bridge research and production.
Familiarity with building large-scale distributed systems for synthetic data generation
Our culture
🏢 We’re an in-person team based out of San Francisco. We love being in the office, hanging out together and learning from each other everyday.
🚢 We ship fast. All of our work is novel and cutting edge, and execution speed is paramount. We have a high bar, and we don’t sacrifice quality and design along the way.
🤝 We support each other. We have an open and inclusive culture that’s focused on giving everyone the resources they need to succeed.
Our perks
🍽 Lunch, dinner and snacks at the office.
🏥 Fully covered medical, dental, and vision insurance for employees.
🏦 401(k).
✈️ Relocation and immigration support.
🦖 Your own personal Yoshi.
Compensation Range: $180K - $250K