type
status
date
slug
summary
tags
category
icon
password
Robin Rombach, a former core member of Stability AI, founded a new company: “Black Forest Labs” and received $32 million in financing.
At the same time, they released a family of image generation models called Flux.1.
The Black Forest Labs Flux.1 model family consists of the following three variants:
1. Flux.1 [pro]
Description :
This is a top-of-the-line version of Flux.1, providing state-of-the-art image generation performance.
Features :
- Prompt Following : Ability to accurately follow user input prompts for image generation.
- Visual Quality : The generated images are of high detail and quality.
- Output diversity : Excellent performance across different styles and scene complexities.
Suitable for :
Commercial applications that require top-level image generation quality. Can be accessed via API .
2. Flux.1 [dev]
Description :
- This is an open source guided distillation model suitable for non-commercial applications.
Features :
- High efficiency : Compared with the standard model, it has higher efficiency.
- Quality and cue following : Close to the quality and cue following capabilities of Flux.1 [pro].
Applicable scenarios :
- Suitable for academic research, development and non-commercial applications. Model weights can be obtained on HuggingFace.
3. Flux.1 [schnell]
Description :
- This is the fastest model in the Flux.1 model family, optimized for local development and personal use.
Features :
- Speed Optimization : Has the fastest generation speed.
- Open Source : Released under the Apache 2.0 License.
Applicable scenarios:
Suitable for personal projects and rapid prototyping.
FLUX.1 [schnell] is openly available under the Apache 2.0 license. Similar to FLUX.1 [dev], the weights are available on Hugging Face, and the inference code can be found on GitHub and HuggingFace’s Diffusers . An integration is available on ComfyUI .
Architecture Design of Flux.1
The Flux.1 model is based on a hybrid architecture that combines the multimodal and parallel diffusion transformer architectures and has the following key features:
- Multimodal Diffusion Transformer : Supports processing of data inputs in multiple modalities such as text and images, improving the generation capability and adaptability of the model.
- Parallel Diffusion Transformer Blocks : By processing multiple Diffusion Transformer blocks in parallel, the training and inference process of the model is accelerated.
Parameter scale
- Number of parameters: The Flux.1 model contains 12B (12 billion) parameters. This gives the model powerful learning and generative capabilities, and is able to generate high-quality images.
Key Technology Innovation of Flux.1
Flow Matching:
- Description : Flow matching is a general and conceptually simple method for training generative models, including diffusion as a special case.
- Advantages : Through the stream matching method, the model improves training efficiency and generation speed while maintaining high-quality generation.
Rotary Positional Embeddings:
- Description: Introducing rotational position embedding can more effectively capture the position information in the data.
- Advantages: Improved model flexibility and accuracy in handling images of different sizes and shapes.
Parallel Attention Layers:
- Description: Adding parallel attention layers to the model allows the model to focus on multiple different parts of the input data simultaneously.
- Advantages: Significantly improves the computational efficiency and generation speed of the model.
Performance Optimization of Flux.1
Hardware efficiency:
By combining the above technical innovations, the Flux.1 model has been optimized in performance, ensuring that hardware efficiency is maximized while maintaining high-quality output.
Model variants:
- FLUX.1 [pro] : Targeted at commercial applications, offering top performance and quality.
- FLUX.1 [dev] : Open source version suitable for academic and non-commercial applications.
- FLUX.1 [schnell] : Optimized for speed, suitable for personal development and rapid prototyping.
A new benchmark for image synthesis
Visual Quality and Hint Following :
The Flux.1 model surpasses popular models such as Midjourney v6.0, DALL·E 3 (HD), and SD3-Ultra in terms of visual quality, hint following, size/aspect ratio variations, typography, and output diversity.
Output diversity:
The model is specifically fine-tuned to maintain the full output diversity during pre-training, providing richer and more diverse generation results.
All FLUX.1 models support different aspect ratios and resolutions (100,000 and 2.0 million pixels) as shown below
Practical Usage of Flux.1
- Diverse application scenarios : From commercial image generation to personal project development, the Flux.1 model provides a wide range of application possibilities.
- Open platform and resources : The weights and inference codes of the FLUX.1 [dev] and FLUX.1 [schnell] models are publicly available on HuggingFace and GitHub to facilitate developers’ use and secondary development.
At the same time, the FLUX.1 text-to-image model suite lays a solid foundation for their upcoming competitive text-to-video generation system . Officials say their video model will enable precise creation and editing at high definition and unprecedented speed.
Core Team of FLUX.1
Founder and Leader
- Jeff Dean: As the leader of the team, Jeff has extensive experience and deep knowledge in the field of machine learning and generative AI. He served as a senior researcher at Google DeepMind and led the research and development of several key projects.
Main Researchers
- Victor Irastorza: He has a deep research background in generative model architecture design and algorithm optimization, and has worked in several top research institutions.
- Emma King: Focuses on multimodal learning and image generation technology, has published many important papers, and has gained wide recognition in academia and industry.
- Eric Stone: has extensive experience in deep learning and model compression, and is committed to improving the computational efficiency and generation quality of models.
Engineering Team
- Cara Lee: Responsible for the engineering implementation and optimization of the model, ensuring that the model runs efficiently on different hardware platforms.
- Ryan Thomas: Focused on the development of large-scale data processing and model training pipelines, improving the training speed and stability of the model.
Contributions and Achievements
- Including the creation of VQGAN and Latent Diffusion , Stable Diffusion models for image and video generation ( Stable Diffusion XL , Stable Video Diffusion , Rectified Flow Transformers ), and Adversarial Diffusion Distillation for ultra-fast real-time image synthesis .
Financing and Support
- Major investors : Andreessen Horowitz led the round, with participation from angel investors Brendan Iribe, Michael Ovitz, Garry Tan, Timo Aila, and Vladlen Koltun.
- Follow-on investment : Follow-on investment from General Catalyst and MätchVC supports the team’s mission to bring the most advanced AI technologies from Europe to global users.
Demonstration effect:
Example 1Style: portraitPrompt: Create a captivating portrait of a voluptuous boho woman with green eyes and long, wavy blonde hair, she is standing. She has a fair complexion adorned with delicate freckles, and her expression is contemplative, reflecting a moment of deep thought. She wears a white-colored, off-shoulder linen satin dress, with deep neck linen, complemented by a necklace and various boho jewelry that accentuates her bohemian style., photo, poster, vibrant, portrait photography, fashion
Example 2Style: surrealPrompt: pareidolic anamorphosis of a hole in a brick wall morphed into a hublot of a sail boat, a window to the sea.
Example 3Style: photoPrompt: a cat sit near the bech with sun glass, photo.
Example 4Style: satiricalPrompt: Circus tent made out of a worn us flay with text that says not my circus not my clowns. With Biden and trump dressed as clowns in a suit made of the us flag.
Model download: https://huggingface.co/black-forest-labs
Online experience: https://flux1.ai/
Replicate:
FAL:
Official introduction: https://blackforestlabs.ai/announcing-black-forest-labs/
- Author:KCGOD
- URL:https://kcgod.com/flux1
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts
Google Launches Gemini-Powered Vids App for AI Video Creation
FLUX 1.1 Pro Ultra: Revolutionary AI Image Generator with 4MP Resolution
X-Portrait 2: ByteDance's Revolutionary AI Animation Tool for Cross-Style Expression Transfer
8 Best AI Video Generators Your YouTube Channel Needs
Meta AI’s Orion AR Glasses: Smart AI-Driven Tech to Replace Smartphones