Distributed Systems Engineer

Luma AI

Luma AI

Software Engineering
Palo Alto, CA, USA
USD 150k-300k / year + Equity
Posted on Jul 29, 2025

Location

Palo Alto, CA

Employment Type

Full time

Location Type

Hybrid

Department

Applied Research

Compensation

  • Estimated Base Salary $150K – $300K • Offers Equity

We are looking for people with strong ML & Distributed systems backgrounds. This role will work within our Research team, closely collaborating with researchers to build the platforms for training our next generation of foundation models.

Responsibilities

  • Work with researchers to scale up the systems required for our next generation of models trained on multi-thousand GPU clusters.

  • Profile and optimize our model training code-base to achieve best in class hardware efficiency.

  • Build systems to distribute work across massive GPU clusters efficiently.

  • Design and implement methods to robustly train models in the presence of hardware failures.

  • Build tooling to help us better understand problems in our largest training jobs.

Experience

  • 5+ years of work experience.

  • Experience working with multi-modal ML pipelines, high performance computing and/or low level systems.

  • Passion for diving deep into systems implementations and understanding their fundamentals in order to improve their performance and maintainability.

  • Experience building stable and highly efficient distributed systems.

  • Strong generalist Python and Software skills including significant experience with Pytorch.

  • Good to have experience working with high performance C++ or CUDA.

Your application is reviewed by real people.

Compensation Range: $150K - $300K