Member of Technical Staff - Pretraining / Inference Optimization

Black Forest Labs

Black Forest Labs

IT
United States · Germany · Remote
Posted on Oct 10, 2024

Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently seeking a strong researcher / engineer to work closely with our research team on pretraining and inference optimization.

Role:

  • Finding ideal training strategies (parallelism, precision trade-offs) for a variety of model sizes and compute loads
  • Profiling, debugging, and optimizing single and multi-GPU operations using tools such as Nsight or stack trace viewers
  • Reasoning about the speed and quality trade-offs of quantization for model inference
  • Developing and improving low-level kernel optimizations for state-of-the-art inference and training
  • Innovating new ideas that bring us closer to the limits of a GPU

Ideal Experiences:

  • Being familiar with the latest and the most effective techniques in optimizing inference and training workloads
  • Optimizing for both memory-bound and compute-bound operations
  • Understanding GPU memory hierarchy and computation capabilities
  • Deep understanding of efficient attention algorithms
  • Implementing both forward and backward Triton kernels and ensuring their correctness while considering floating point errors
  • Using, for example, pybind to integrate custom-written kernels into a PyTorch framework

Nice to have:

  • Experience with Diffusion and Autoregressive models
  • Experience in low-level CUDA kernel optimizations