Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ We have trained the first multilingual Stable Diffusion (SD) model that supports
|
|
8 |
|
9 |
As shown in Figure 1, the training process consists of two stages: concept alignment and quality improvement. We first replaced the original OpenCLIP in SD with the multilingual CLIP AltCLIP-m18 and froze its parameters. In the first stage, we trained the k,v matrices in the CrossAttention layer of the Unet model to align the concepts between text and image using 256\*256 image resolution. In the second stage, we trained all the parameters in the Unet model to improve the generation performance using 512\*512 image resolution.
|
10 |
|
11 |
-
|
12 |
|
13 |
<center>
|
14 |
图1: AltDiffusion示意图 (Fig.1: illustrate for AltDiffusion)
|
@@ -39,7 +39,7 @@ checkpoint we used is SD v2.1 512-base-ema. We also use Xformer and Efficient At
|
|
39 |
|
40 |
### 中文效果
|
41 |
|
42 |
-
|
43 |
|
44 |
### 长图效果
|
45 |
![long1](./imgs/long1.SVG)
|
|
|
8 |
|
9 |
As shown in Figure 1, the training process consists of two stages: concept alignment and quality improvement. We first replaced the original OpenCLIP in SD with the multilingual CLIP AltCLIP-m18 and froze its parameters. In the first stage, we trained the k,v matrices in the CrossAttention layer of the Unet model to align the concepts between text and image using 256\*256 image resolution. In the second stage, we trained all the parameters in the Unet model to improve the generation performance using 512\*512 image resolution.
|
10 |
|
11 |
+
![illustrate for AltDiffusion](./imgs/model.png)
|
12 |
|
13 |
<center>
|
14 |
图1: AltDiffusion示意图 (Fig.1: illustrate for AltDiffusion)
|
|
|
39 |
|
40 |
### 中文效果
|
41 |
|
42 |
+
![chinese_samples](./imgs/chinese_samples.png)
|
43 |
|
44 |
### 长图效果
|
45 |
![long1](./imgs/long1.SVG)
|