patrickvonplaten
commited on
Commit
•
cc66b03
1
Parent(s):
acd94cc
Update README.md
Browse files
README.md
CHANGED
@@ -153,20 +153,14 @@ Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder
|
|
153 |
- The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
|
154 |
- The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet.
|
155 |
|
156 |
-
We currently provide four checkpoints,
|
157 |
-
- [`stable-diffusion-v1-1`](https://huggingface.co/CompVis/stable-diffusion-v1-1),
|
158 |
-
- [`stable-diffusion-v1-2`](https://huggingface.co/CompVis/stable-diffusion-v1-2),
|
159 |
-
- [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3), and
|
160 |
-
- [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4).
|
161 |
-
|
162 |
-
The checkpoints were trained as follows:
|
163 |
-
- `stable-diffusion-v1-1`: 237,000 steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
|
164 |
194,000 steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
|
165 |
-
- `stable-diffusion-v1-2
|
166 |
515,000 steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
|
167 |
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
|
168 |
-
- `stable-diffusion-v1-3
|
169 |
-
-
|
170 |
|
171 |
- **Hardware:** 32 x 8 x A100 GPUs
|
172 |
- **Optimizer:** AdamW
|
|
|
153 |
- The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
|
154 |
- The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet.
|
155 |
|
156 |
+
We currently provide four checkpoints, which were trained as follows.
|
157 |
+
- [`stable-diffusion-v1-1`](https://huggingface.co/CompVis/stable-diffusion-v1-1): 237,000 steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
|
|
|
|
|
|
|
|
|
|
|
|
|
158 |
194,000 steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
|
159 |
+
- [`stable-diffusion-v1-2`](https://huggingface.co/CompVis/stable-diffusion-v1-2): Resumed from `stable-diffusion-v1-1`.
|
160 |
515,000 steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
|
161 |
filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
|
162 |
+
- [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
|
163 |
+
- [**`stable-diffusion-v1-4`**](https://huggingface.co/CompVis/stable-diffusion-v1-4) *To-fill-here*
|
164 |
|
165 |
- **Hardware:** 32 x 8 x A100 GPUs
|
166 |
- **Optimizer:** AdamW
|