TencentARC
/

flux-mini

Model card Files Files and versions Community

daoyuan98 commited on Nov 15, 2024

Commit

11ec5ff

·

verified ·

1 Parent(s): 508ed68

Update README.md

Files changed (1) hide show

README.md +8 -5

README.md CHANGED Viewed

@@ -3,12 +3,15 @@ license: apache-2.0
 ---
 # Flux-Mini
 <div align="center">
 <img src="flux_distill-flux-mini-teaser.jpg" width="800" alt="Teaser image">
 </div>
-A distilled Flux-dev model for efficient text-to-image generation
@@ -17,10 +20,10 @@ To bridge this gap, we distilled the **12B** `Flux-dev` model into a **3.2B** `F
 Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`.
 The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset.
-We empeirically found that different blocks has different impact on the generation quality, thus we initialize the student model with several most important blocks.
-The distillation process consists of three objectives: the denoise loss, the output alignment loss and the feature alignment loss.
-The feature aligement loss is designed in a way such that the output of `block-x` in the student model is encouraged to match that of `block-4x` in the teacher model.
-The distillation process is performed with `512x512` laion images recaptioned with `Qwen-VL` in the first stage for `90k steps`,
 and `1024x1024` images generated by `Flux` using the prompts in `JourneyDB` with another `90k steps`.

 ---
 # Flux-Mini
+A distilled Flux-dev model for efficient text-to-image generation
 <div align="center">
 <img src="flux_distill-flux-mini-teaser.jpg" width="800" alt="Teaser image">
 </div>
 Specifically, we prune the original `Flux-dev` by reducing its depth from `19 + 38` (number of double blocks and single blocks) to `5 + 10`.
 The pruned model is further tuned with denoising and feature alignment objectives on a curated image-text dataset.
+We empirically found that different blocks have different impacts on the generation quality, thus we initialize the student model with several most important blocks.
+The distillation process consists of three objectives: the denoise loss, the output alignment loss as well as the feature alignment loss.
+The feature alignment loss is designed in a way such that the output of `block-x` in the student model is encouraged to match that of `block-4x` in the teacher model.
+The distillation process is performed with `512x512` Laion images recaptioned with `Qwen-VL` in the first stage for `90k steps`,
 and `1024x1024` images generated by `Flux` using the prompts in `JourneyDB` with another `90k steps`.