Software Engineer, Data Infrastructure - Research

Luma AI

Luma AI

Software Engineering, Other Engineering
Palo Alto, CA, USA
Posted on Apr 2, 2026
About The Role

As a Data Infrastructure Engineer in Research at Luma, you will play a critical role in building and scaling the data infrastructure that supports our cutting-edge multimodal AI systems. Your work will focus on developing high-throughput, large-scale data processing pipelines tailored for machine learning research and internal ML platform needs. You will collaborate closely with ML researchers and product teams to create reliable, efficient, and easy-to-use data infrastructure that empowers innovation and accelerates development. This role requires a strong foundation in distributed systems and data engineering, with an emphasis on supporting complex machine learning workflows rather than traditional product data infrastructure.

Responsibilities

  • Build and maintain scalable data infrastructure for high-throughput machine learning workflows
  • Collaborate with ML researchers and product teams to ensure data systems meet evolving needs
  • Develop and optimize large-scale data pipelines and batch processing jobs
  • Contribute to the architecture and implementation of reliable, high-performance data platforms
  • Integrate open-source tools and continuously improve data infrastructure through monitoring and tuning
  • Participate in cross-functional projects to improve data reliability, scalability, and operational excellence
  • Support the evaluation and adoption of new programming languages and frameworks relevant to data infrastructure
  • Engage in continuous improvement of data infrastructure through monitoring, troubleshooting, and performance tuning
  • Collaborate with research & engineering teams to help define and refine best practices for data infrastructure development

Qualifications

  • Proficiency in Python (or similar languages with willingness to learn Python) and experience with large-scale, high-throughput data infrastructure
  • Familiarity with distributed computing frameworks (e.g., Ray, Spark, Beam)
  • Ability to design and optimize data pipelines for ML research and internal teams
  • Strong problem-solving skills and understanding of data engineering at scale
  • Collaborative, product-focused mindset; comfortable in fast-paced environments
  • Experience sourcing, integrating, and optimizing data from diverse and large datasets
  • Comfortable working in a fast-paced, product-focused environment with a strong execution mindset
  • Open to candidates across seniority levels, from mid-level individual contributors to senior engineers and managers.

Nice to have

  • Prior experience working with complex data infrastructure or AI/ML platforms highly desirable
  • Experience with open source data infrastructure projects is a plus