Update README.md
Browse files
README.md
CHANGED
@@ -3,12 +3,15 @@ license: apache-2.0
|
|
3 |
---
|
4 |
|
5 |
# Flux-Mini
|
|
|
|
|
|
|
|
|
6 |
<div align="center">
|
7 |
<img src="flux_distill-flux-mini-teaser.jpg" width="800" alt="Teaser image">
|
8 |
</div>
|
9 |
|
10 |
|
11 |
-
A distilled Flux-dev model for efficient text-to-image generation
|
12 |
|
13 |
|
14 |
|
@@ -17,10 +20,10 @@ To bridge this gap, we distilled the **12B** `Flux-dev` model into a **3.2B** `F
|
|
17 |
Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`.
|
18 |
The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset.
|
19 |
|
20 |
-
We
|
21 |
-
The distillation process consists of three objectives: the denoise loss, the output alignment loss
|
22 |
-
The feature
|
23 |
-
The distillation process is performed with `512x512`
|
24 |
and `1024x1024` images generated by `Flux` using the prompts in `JourneyDB` with another `90k steps`.
|
25 |
|
26 |
|
|
|
3 |
---
|
4 |
|
5 |
# Flux-Mini
|
6 |
+
|
7 |
+
A distilled Flux-dev model for efficient text-to-image generation
|
8 |
+
|
9 |
+
|
10 |
<div align="center">
|
11 |
<img src="flux_distill-flux-mini-teaser.jpg" width="800" alt="Teaser image">
|
12 |
</div>
|
13 |
|
14 |
|
|
|
15 |
|
16 |
|
17 |
|
|
|
20 |
Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`.
|
21 |
The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset.
|
22 |
|
23 |
+
We empirically found that different blocks have different impacts on the generation quality, thus we initialize the student model with several most important blocks.
|
24 |
+
The distillation process consists of three objectives: the denoise loss, the output alignment loss as well as the feature alignment loss.
|
25 |
+
The feature alignment loss is designed in a way such that the output of `block-x` in the student model is encouraged to match that of `block-4x` in the teacher model.
|
26 |
+
The distillation process is performed with `512x512` Laion images recaptioned with `Qwen-VL` in the first stage for `90k steps`,
|
27 |
and `1024x1024` images generated by `Flux` using the prompts in `JourneyDB` with another `90k steps`.
|
28 |
|
29 |
|