Research Scientist / Engineer — Foundation Model (Image / Video)
Luma AI
United States · California, USA · Remote
USD 187,500-395k / year
About Luma AI
Where You Come In
What You'll Do
- Research and Define the next frontier of multimodal capabilities, identifying key gaps in our current models and designing the experiments to solve them.
- Design and Execute novel experiments, datasets, and methodologies to systematically improve model performance across vision, audio, and language.
- Develop and Pioneer new evaluation frameworks and benchmarking approaches to precisely measure novel multimodal behaviors and capabilities.
- Collaborate Deeply with other research teams to translate your findings into our core training recipes and unlock new product experiences.
- Build and Prototype compelling demonstrations that showcase the groundbreaking multimodal capabilities you have unlocked.
Who You Are
- You have a PhD or equivalent research experience in a field related to AI, Machine Learning, or Computer Science.
- You have strong programming skills in Python and deep, hands-on experience with PyTorch.
- You have a proven track record of working with multimodal data pipelines and curating large-scale datasets for research.
- You possess a deep, fundamental understanding of at least one of the core modalities: computer vision, audio processing, or natural language processing.
- You thrive on tackling the most ambitious, open-ended research challenges in a fast-paced, collaborative environment.
What Sets You Apart (Bonus Points)
- Direct expertise working with complex, interleaved multimodal data (video, audio, text).
- Hands-on experience training or fine-tuning Vision Language Models (VLMs), Audio Language Models, or large-scale generative video models from scratch.
- A strong publication record in top-tier AI conferences (e.g., NeurIPS, ICML, CVPR, ICLR).
- Experience leading ambitious, open-ended research projects from ideation to tangible results.