Data Engineer
Sanas
This job is no longer accepting applications
See open jobs at Sanas.See open jobs similar to "Data Engineer" General Catalyst.Software Engineering, Data Science
Bengaluru, Karnataka, India
Key Responsibilities :
- Build scalable, fault-tolerant pipelines for ingesting, processing, and transforming large volumes of audio and metadata.
- Design and maintain ETL workflows for training and evaluating ML models, using tools like Airflow or custom pipelines.
- Collaborate with ML research scientists to make raw and derived audio features (e.g., spectrograms, MFCCs) efficiently available for training and inference.
- Manage and organize datasets, including labeling workflows, versioning, annotation pipelines, and compliance with privacy policies.
- Implement data quality, observability, and validation checks across critical data pipelines.
- Help optimize data storage and compute strategies for large-scale training.
Qualifications :
- 2–5 years of experience as a Data Engineer, Software Engineer, or similar role with a focus on data infrastructure.
- Proficient in Python, SQL, and working with distributed data processing tools (e.g., Spark, Dask, Beam).
- Experience with cloud data infrastructure (AWS/GCP), object storage (e.g.,S3), and data orchestration tools.
- Familiarity with audio data and its unique challenges (large file sizes, time-series features, metadata handling) is a strong plus.
- Comfortable working in a fast-paced, iterative startup environment where systems are constantly evolving.
- Strong communication skills and a collaborative mindset — you’ll be working cross-functionally with ML, infra, and product teams.
Nice to Have :
- Experience with data for speech models like ASR, TTS, or speaker verification.
- Knowledge of real-time data processing (e.g., Kafka, WebSockets, or low-latency APIs).
- Background in MLOps, feature engineering, or supporting model lifecycle workflows.
- Experience with labeling tools, audio annotation platforms, or human-in-the-loop systems.
This job is no longer accepting applications
See open jobs at Sanas.See open jobs similar to "Data Engineer" General Catalyst.