update doc
Browse files
README.md
CHANGED
@@ -12,11 +12,11 @@ tags:
|
|
12 |
---
|
13 |
|
14 |
|
15 |
-
# Stable Diffusion XL 1.0 for ONNX Runtime
|
16 |
|
17 |
## Introduction
|
18 |
|
19 |
-
This repository hosts the optimized
|
20 |
|
21 |
The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
|
22 |
```
|
@@ -30,23 +30,10 @@ See the [usage instructions](#usage-example) for how to run the SDXL pipeline wi
|
|
30 |
- **Developed by:** Stability AI
|
31 |
- **Model type:** Diffusion-based text-to-image generative model
|
32 |
- **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
|
33 |
-
- **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
|
34 |
|
35 |
The VAE decoder is converted from [sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix). There are slight discrepancies between its output and that of the original VAE, but the decoded images should be [close enough for most purposes](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/discussions/7#64c5c0f8e2e5c94bd04eaa80).
|
36 |
|
37 |
-
## Performance Comparison
|
38 |
-
|
39 |
-
#### Latency for 30 steps base and 9 steps refiner
|
40 |
-
|
41 |
-
Below is average latency of generating an image of size 1024x1024 using NVIDIA A100-SXM4-80GB GPU:
|
42 |
-
|
43 |
-
| Batch Size | PyTorch 2.1 | ONNX Runtime CUDA |
|
44 |
-
|------------|----------------|-------------------|
|
45 |
-
| 1 | 3779 ms | 3389 ms |
|
46 |
-
| 4 | 13504 ms | 12264 ms |
|
47 |
-
|
48 |
-
In this test, CUDA graph was used to speed up in both torch compile the unet and ONNX Runtime.
|
49 |
-
|
50 |
## Usage Example
|
51 |
|
52 |
Following the [demo instructions](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md#run-demo-with-docker). Example steps:
|
|
|
12 |
---
|
13 |
|
14 |
|
15 |
+
# Stable Diffusion XL 1.0 for ONNX Runtime CUDA
|
16 |
|
17 |
## Introduction
|
18 |
|
19 |
+
This repository hosts the optimized onnx models of **Stable Diffusion XL Base 1.0** to accelerate inference with ONNX Runtime CUDA execution provider for NVidia GPUs. It cannot run in other execution providers like CPU or DirectML.
|
20 |
|
21 |
The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
|
22 |
```
|
|
|
30 |
- **Developed by:** Stability AI
|
31 |
- **Model type:** Diffusion-based text-to-image generative model
|
32 |
- **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
|
33 |
+
- **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
|
34 |
|
35 |
The VAE decoder is converted from [sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix). There are slight discrepancies between its output and that of the original VAE, but the decoded images should be [close enough for most purposes](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/discussions/7#64c5c0f8e2e5c94bd04eaa80).
|
36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
## Usage Example
|
38 |
|
39 |
Following the [demo instructions](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md#run-demo-with-docker). Example steps:
|