tlwu
/

stable-diffusion-xl-base-1.0-onnxruntime

@@ -12,11 +12,11 @@ tags:
 ---
-# Stable Diffusion XL 1.0 for ONNX Runtime
 ## Introduction
-This repository hosts the optimized versions of **Stable Diffusion XL 1.0** to accelerate inference with ONNX Runtime CUDA execution provider.
 The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
 ```
@@ -30,23 +30,10 @@ See the [usage instructions](#usage-example) for how to run the SDXL pipeline wi
 - **Developed by:** Stability AI
 - **Model type:** Diffusion-based text-to-image generative model
 - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
-- **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [SDXL refiner 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) models for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
 The VAE decoder is converted from [sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix). There are slight discrepancies between its output and that of the original VAE, but the decoded images should be [close enough for most purposes](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/discussions/7#64c5c0f8e2e5c94bd04eaa80).
-## Performance Comparison
-#### Latency for 30 steps base and 9 steps refiner
-Below is average latency of generating an image of size 1024x1024 using NVIDIA A100-SXM4-80GB GPU:
-| Batch Size | PyTorch 2.1    | ONNX Runtime CUDA |
-|------------|----------------|-------------------|
-| 1          | 3779 ms        | 3389 ms           |
-| 4          | 13504 ms       | 12264 ms          |
-In this test, CUDA graph was used to speed up in both torch compile the unet and ONNX Runtime.
 ## Usage Example
 Following the [demo instructions](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md#run-demo-with-docker). Example steps:

 ---
+# Stable Diffusion XL 1.0 for ONNX Runtime CUDA
 ## Introduction
+This repository hosts the optimized onnx models of **Stable Diffusion XL Base 1.0** to accelerate inference with ONNX Runtime CUDA execution provider for NVidia GPUs. It cannot run in other execution providers like CPU or DirectML.
 The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
 ```
 - **Developed by:** Stability AI
 - **Model type:** Diffusion-based text-to-image generative model
 - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
+- **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
 The VAE decoder is converted from [sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix). There are slight discrepancies between its output and that of the original VAE, but the decoded images should be [close enough for most purposes](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/discussions/7#64c5c0f8e2e5c94bd04eaa80).
 ## Usage Example
 Following the [demo instructions](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md#run-demo-with-docker). Example steps: