Diffusers
Safetensors
RectifiedFlowPipeline
File size: 2,747 Bytes
e9c9800
 
 
f26a78d
 
 
 
 
 
 
 
 
 
 
934962e
0f2635d
8fab218
 
 
f26a78d
8fab218
 
 
 
 
f26a78d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f2635d
 
 
 
 
f26a78d
0f2635d
f26a78d
 
 
0f2635d
 
 
 
 
 
 
f26a78d
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: cc-by-nc-4.0
---

# InstaFlow: 2-Rectified Flow fine-tuned from Stable Diffusion v1.5

2-Rectified Flow is a few-step text-to-image generative model fine-tuned from Stabled Diffusion v1.5.

We use text-conditioned reflow as described in [our paper](https://arxiv.org/abs/2309.06380). 

Reflow has interesting theoretical properties. You may check [this ICLR paper](https://arxiv.org/abs/2209.03003) and [this arXiv paper](https://arxiv.org/abs/2209.14577).

## Images Generated from Random Diffusion DB prompts

We compare SD 1.5+DPM-Solver and 2-Rectified Flow with random prompts from Diffusion DB using the same random seeds. We observe that 2-Rectiifed Flow is straighter.

| ![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/MXEZ5YQtsnr70XzVnH8gQ.png) |
| :---: |
| **Prompt**: a renaissance portrait of dwayne johnson, art in the style of rembrandt. |

| ![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/dqPdE0JFqNtUnu6wy3ugF.png) |
| :---: |
| **Prompt**: a photo of a rabbit head on a grizzly bear body. |

# Usage

Please refer to the [official github repo](https://github.com/gnobitab/InstaFlow).

## Training

Training pipeline:
1. Reflow (Stage 1): We train the model using the text-conditioned reflow objective with a batch size of 64 for 70,000 iterations. 
The model is initialized from the pre-trained SD 1.5 weights. (11.2 A100 GPU days)
2. Reflow (Stage 2):  We continue to train the model using the text-conditioned reflow objective with an increased batch size of 1024 for 25,000 iterations. (64 A100 GPU days)

The final model is **2-Rectified Flow**.

**Total Training Cost:**  It takes 75.2 A100 GPU days to get 2-Rectified Flow.


## Evaluation Results - Metrics

The following metrics of 2-Rectified Flow are measured on MS COCO 2017 with 5000 images and 25-step Euler solver:

*FID-5k = 21.5, CLIP score = 0.315*

Few-Step performance:

![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/GS_ApYjpbtmwnICgHOZmD.png)

## Evaluation Results - Impact of Guidance Scale

We evaluate the impact of the guidance scale on 2-Rectified Flow.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/h_GbLBjnE8tP67Fgzj6ER.png)

Trade-off Curve:

![image/png](https://cdn-uploads.huggingface.co/production/uploads/646b0bbdec9a61e871799339/ldplYcANcoPogbqdOP1p9.png)

## Citation
```
@article{liu2023insta,
  title={InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation},
  author={Liu, Xingchao and Zhang, Xiwen and Ma, Jianzhu and Peng, Jian and Liu, Qiang},
  journal={arXiv preprint arXiv:2309.06380},
  year={2023}
}
```