Rename autoencoder
Browse files
README.md
CHANGED
@@ -48,17 +48,17 @@ The model developer used the following dataset for training the model:
|
|
48 |
**Training Procedure**
|
49 |
StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
|
50 |
|
51 |
-
- Following Stable Diffusion, images are encoded through the fixed
|
52 |
- The latent representations are fed to the time-aware encoder as guidance.
|
53 |
- The loss is the same as Stable Diffusion.
|
54 |
- After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
|
55 |
-
- The
|
56 |
-
- The loss is similar to training
|
57 |
|
58 |
We currently provide the following checkpoints:
|
59 |
|
60 |
- [stablesr_000117.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/stablesr_000117.ckpt): Diffusion model finetuned on [SD2.1-512base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) with DF2K_OST dataset for 117 epochs.
|
61 |
-
- [vqgan_cfw_00011.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/vqgan_cfw_00011.ckpt): CFW module with fixed
|
62 |
- [stablesr_768v_000139.ckpt](https://huggingface.co/Iceclear/StableSR/blob/main/stablesr_768v_000139.ckpt): Diffusion model finetuned on [SD2.1-768v](https://huggingface.co/stabilityai/stable-diffusion-2-1) with DF2K_OST dataset for 139 epochs.
|
63 |
|
64 |
## Evaluation Results
|
|
|
48 |
**Training Procedure**
|
49 |
StableSR is an image super-resolution model finetuned on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), further equipped with a time-aware encoder and a controllable feature wrapping (CFW) module.
|
50 |
|
51 |
+
- Following Stable Diffusion, images are encoded through the fixed autoencoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4.
|
52 |
- The latent representations are fed to the time-aware encoder as guidance.
|
53 |
- The loss is the same as Stable Diffusion.
|
54 |
- After finetuning the diffusion model, we further train the CFW module using the data generated by the finetuned diffusion model.
|
55 |
+
- The autoencoder model is fixed and only CFW is trainable.
|
56 |
+
- The loss is similar to training an autoencoder, except that we use a fixed adversarial loss weight of 0.025 rather than a self-adjustable one.
|
57 |
|
58 |
We currently provide the following checkpoints:
|
59 |
|
60 |
- [stablesr_000117.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/stablesr_000117.ckpt): Diffusion model finetuned on [SD2.1-512base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) with DF2K_OST dataset for 117 epochs.
|
61 |
+
- [vqgan_cfw_00011.ckpt](https://huggingface.co/Iceclear/StableSR/resolve/main/vqgan_cfw_00011.ckpt): CFW module with fixed autoencoder trained on synthetic paired data for 11 epochs.
|
62 |
- [stablesr_768v_000139.ckpt](https://huggingface.co/Iceclear/StableSR/blob/main/stablesr_768v_000139.ckpt): Diffusion model finetuned on [SD2.1-768v](https://huggingface.co/stabilityai/stable-diffusion-2-1) with DF2K_OST dataset for 139 epochs.
|
63 |
|
64 |
## Evaluation Results
|