stabilityai
/

sd-turbo

Text-to-Image

Diffusers

Safetensors

StableDiffusionPipeline

Model card Files Files and versions Community

rromb commited on Nov 30, 2023

Commit

70f22a3

•

1 Parent(s): 373dc60

Update README.md

Browse files

Files changed (1) hide show

README.md +22 -16

README.md CHANGED Viewed

@@ -1,7 +1,6 @@
 ---
-# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
-# Doc / guide: https://huggingface.co/docs/hub/model-cards
-{}
 ---
 # SD-Turbo Model Card
@@ -9,18 +8,22 @@
 <!-- Provide a quick summary of what the model is/does. -->
 ![row01](output_tile.jpg)
 SD-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation.
 ## Model Details
 ### Model Description
-TODO: ADD DETAILED MODEL DESCRIPTION.
 - **Developed by:** Stability AI
 - **Funded by:** Stability AI
 - **Model type:** Generative text-to-image model
-- **Finetuned from model:** [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5)
 ### Model Sources
@@ -28,14 +31,19 @@ For research purposes, we recommend our `generative-models` Github repository (h
 which implements the most popular diffusion frameworks (both training and inference).
 - **Repository:** https://github.com/Stability-AI/generative-models
-- **Paper:** TODO
 ## Evaluation
-![comparison](comparison.png)
-The chart above evaluates user preference for SD-Turbo over TODO.
-SDXL-Turbo is preferred by human voters in terms of image quality and prompt following.
-For details on the user study, we refer to the [research paper](TODO)
 ## Uses
@@ -62,6 +70,7 @@ The model should not be used in any way that violates Stability AI's [Acceptable
 ## Limitations and Bias
 ### Limitations
 - The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism.
 - The model cannot render legible text.
 - Faces and people in general may not be generated properly.
@@ -74,7 +83,4 @@ The model is intended for research purposes only.
 ## How to Get Started with the Model
-Check out https://github.com/Stability-AI/generative-models

 ---
+pipeline_tag: text-to-image
+inference: false
 ---
 # SD-Turbo Model Card
 <!-- Provide a quick summary of what the model is/does. -->
 ![row01](output_tile.jpg)
 SD-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a single network evaluation.
+We release SD-Turbo as a research artifact, and to study small, distilled text-to-image models. For increased quality and prompt understanding,
+we recommend [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo/).
 ## Model Details
 ### Model Description
+SD-Turbo is a distilled version of [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1), trained for real-time synthesis.
+SD-Turbo is based on a novel training method called Adversarial Diffusion Distillation (ADD) (see the [technical report](https://stability.ai/research/adversarial-diffusion-distillation)), which allows sampling large-scale foundational
+image diffusion models in 1 to 4 steps at high image quality.
+This approach uses score distillation to leverage large-scale off-the-shelf image diffusion models as a teacher signal and combines this with an
+adversarial loss to ensure high image fidelity even in the low-step regime of one or two sampling steps.
 - **Developed by:** Stability AI
 - **Funded by:** Stability AI
 - **Model type:** Generative text-to-image model
+- **Finetuned from model:** [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1)
 ### Model Sources
 which implements the most popular diffusion frameworks (both training and inference).
 - **Repository:** https://github.com/Stability-AI/generative-models
+- **Paper:** https://stability.ai/research/adversarial-diffusion-distillation
+- **Demo [for the bigger SDXL-Turbo]:** http://clipdrop.co/stable-diffusion-turbo
 ## Evaluation
+![comparison1](image_quality_one_step.png)
+![comparison2](prompt_alignment_one_step.png)
+The charts above evaluate user preference for SD-Turbo over other single- and multi-step models.
+SD-Turbo evaluated at a single step is preferred by human voters in terms of image quality and prompt following over LCM-XL evaluated at four (or fewer) steps.
+In addition, we see that using four steps for SD-Turbo further improves performance.
+**Note:** For increased quality, we recommend the bigger version [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo/).
+For details on the user study, we refer to the [research paper](https://stability.ai/research/adversarial-diffusion-distillation).
 ## Uses
 ## Limitations and Bias
 ### Limitations
+- The quality and prompt alignment is lower than that of [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo/).
 - The generated images are of a fixed resolution (512x512 pix), and the model does not achieve perfect photorealism.
 - The model cannot render legible text.
 - Faces and people in general may not be generated properly.
 ## How to Get Started with the Model
+Check out https://github.com/Stability-AI/generative-models