Member of Technical Staff - Model Serving / API Backend

Black Forest Labs

Black Forest Labs

Software Engineering, IT
United States · Germany · Remote
Posted on Oct 10, 2024

Black Forest Labs is a cutting-edge startup pioneering generative image and video models. Our team, which invented Stable Diffusion, Stable Video Diffusion, and FLUX.1, is currently a strong candidate to join us in developing and improving our API / model serving backend and services.

Role:

  • Develop and maintain robust APIs for serving machine learning models
  • Transform research models into production-ready demos and MVPs
  • Optimize model inference for improved performance and scalability
  • Implement and manage user preference data acquisition systems
  • Ensure high availability and reliability of model serving infrastructure
  • Collaborate with ML researchers to rapidly prototype and deploy new models

Ideal Experience:

  • Strong proficiency in Python and its ecosystem for machine learning, data analysis, and web development
  • Extensive experience with RESTful API development and deployment for ML tasks
  • Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes)
  • Knowledge of cloud platforms (AWS, GCP, or Azure) for deploying and scaling ML services
  • Proven track record in rapid ML model prototyping using tools like Streamlit or Gradio
  • Experience with distributed task queues and scalable model serving architectures
  • Understanding of monitoring, logging, and observability best practices for ML systems

Nice to have:

  • Experience with frontend development frameworks (e.g., Vue.js, Angular, React)
  • Familiarity with MLOps practices and tools
  • Knowledge of database systems and data streaming technologies
  • Experience with A/B testing and feature flagging in production environments
  • Understanding of security best practices for API development and ML model serving
  • Experience with real-time inference systems and low-latency optimizations
  • Knowledge of CI/CD pipelines and automated testing for ML systems
  • Expertise in ML inference optimizations, including techniques such as:
    • Reducing initialization time and memory requirements
    • Implementing dynamic batching
    • Utilizing reduced precision and weight quantization
    • Applying TensorRT optimizations
    • Performing layer fusion and model compilation
    • Writing custom CUDA code for performance enhancements