Text-to-Image
daoyuan98 commited on
Commit
11ec5ff
·
verified ·
1 Parent(s): 508ed68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -5
README.md CHANGED
@@ -3,12 +3,15 @@ license: apache-2.0
3
  ---
4
 
5
  # Flux-Mini
 
 
 
 
6
  <div align="center">
7
  <img src="flux_distill-flux-mini-teaser.jpg" width="800" alt="Teaser image">
8
  </div>
9
 
10
 
11
- A distilled Flux-dev model for efficient text-to-image generation
12
 
13
 
14
 
@@ -17,10 +20,10 @@ To bridge this gap, we distilled the **12B** `Flux-dev` model into a **3.2B** `F
17
  Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`.
18
  The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset.
19
 
20
- We empeirically found that different blocks has different impact on the generation quality, thus we initialize the student model with several most important blocks.
21
- The distillation process consists of three objectives: the denoise loss, the output alignment loss and the feature alignment loss.
22
- The feature aligement loss is designed in a way such that the output of `block-x` in the student model is encouraged to match that of `block-4x` in the teacher model.
23
- The distillation process is performed with `512x512` laion images recaptioned with `Qwen-VL` in the first stage for `90k steps`,
24
  and `1024x1024` images generated by `Flux` using the prompts in `JourneyDB` with another `90k steps`.
25
 
26
 
 
3
  ---
4
 
5
  # Flux-Mini
6
+
7
+ A distilled Flux-dev model for efficient text-to-image generation
8
+
9
+
10
  <div align="center">
11
  <img src="flux_distill-flux-mini-teaser.jpg" width="800" alt="Teaser image">
12
  </div>
13
 
14
 
 
15
 
16
 
17
 
 
20
  Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`.
21
  The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset.
22
 
23
+ We empirically found that different blocks have different impacts on the generation quality, thus we initialize the student model with several most important blocks.
24
+ The distillation process consists of three objectives: the denoise loss, the output alignment loss as well as the feature alignment loss.
25
+ The feature alignment loss is designed in a way such that the output of `block-x` in the student model is encouraged to match that of `block-4x` in the teacher model.
26
+ The distillation process is performed with `512x512` Laion images recaptioned with `Qwen-VL` in the first stage for `90k steps`,
27
  and `1024x1024` images generated by `Flux` using the prompts in `JourneyDB` with another `90k steps`.
28
 
29