BAAI
/

AltDiffusion-m18

AltDiffusionPipeline

Model card Files Files and versions Community

Alon77777 commited on Apr 4, 2023

Commit

90f6682

·

1 Parent(s): 88a3192

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ We have trained the first multilingual Stable Diffusion (SD) model that supports
 As shown in Figure 1, the training process consists of two stages: concept alignment and quality improvement. We first replaced the original OpenCLIP in SD with the multilingual CLIP AltCLIP-m18 and froze its parameters. In the first stage, we trained the k,v matrices in the CrossAttention layer of the Unet model to align the concepts between text and image using 256\*256 image resolution. In the second stage, we trained all the parameters in the Unet model to improve the generation performance using 512\*512 image resolution.
-<img src="./imgs/model.png" alt="illustrate for AltDiffusion" style="zoom:35%;" />
 <center>
 图1： AltDiffusion示意图 (Fig.1: illustrate for AltDiffusion)
@@ -39,7 +39,7 @@ checkpoint we used is SD v2.1 512-base-ema. We also use Xformer and Efficient At
 ### 中文效果
-<img src="./imgs/chinese_samples.png" alt="chinese_samples" style="zoom:85%;" />
 ### 长图效果
 ![long1](./imgs/long1.SVG)

 As shown in Figure 1, the training process consists of two stages: concept alignment and quality improvement. We first replaced the original OpenCLIP in SD with the multilingual CLIP AltCLIP-m18 and froze its parameters. In the first stage, we trained the k,v matrices in the CrossAttention layer of the Unet model to align the concepts between text and image using 256\*256 image resolution. In the second stage, we trained all the parameters in the Unet model to improve the generation performance using 512\*512 image resolution.
+![illustrate for AltDiffusion](./imgs/model.png)
 <center>
 图1： AltDiffusion示意图 (Fig.1: illustrate for AltDiffusion)
 ### 中文效果
+![chinese_samples](./imgs/chinese_samples.png)
 ### 长图效果
 ![long1](./imgs/long1.SVG)