S3Diff Model Card
This model card focuses on the models associated with the S3Diff, available here.
Model Details
Developed by: Aiping Zhang
Model type: Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors
Model Description: This is the model used in Paper.
Resources for more information: GitHub Repository.
Cite as:
@article{2024s3diff, author = {Aiping Zhang, Zongsheng Yue, Renjing Pei, Wenqi Ren, Xiaochun Cao}, title = {Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors}, journal = {arxiv}, year = {2024}, }
Limitations and Bias
Limitations
- S3Diff requires a tiled operation for generating a high-resolution image, which would largely increase the inference time.
- S3Diff sometimes cannot keep 100% fidelity due to its generative nature.
- S3Diff sometimes cannot generate perfect details under complex real-world scenarios.
Bias
While our model is based on a pre-trained SD-Turbo model, currently we do not observe obvious bias in generated results. We conjecture the main reason is that our model does not rely on text prompts but on low-resolution images. Such strong conditions make our model less likely to be affected.
Training
Training Data The model developer used the following dataset for training the model:
- Our model is finetuned on LSDIR + 10K samples from FFHQ datasets.
Training Procedure S3Diff is an image super-resolution model finetuned on SD-Turbo, further equipped with a degradation-guided LoRA and online negative prompting.
- Following SD-Turbo, images are encoded through the fixed autoencoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4.
- The LR images are fed to the degradation estimation network, trained by mm-realsr, to predict degradation scores.
- We only inject LoRA layers into the VAE encoder and UNet.
- The total loss includes an L2 Loss, an LPIPS loss, and a GAN loss.
We currently provide the following checkpoints:
- s3diff.pkl: S3Diff finetuned on SD-Turbo for 30k iterations.
- de_net.pth: The degradation estimation network, extracted from mm-realsr.
Evaluation Results
See Paper for details.