tlwu commited on
Commit
07b330d
1 Parent(s): 638944e

update doc

Browse files
Files changed (1) hide show
  1. README.md +3 -16
README.md CHANGED
@@ -12,11 +12,11 @@ tags:
12
  ---
13
 
14
 
15
- # Stable Diffusion XL 1.0 for ONNX Runtime
16
 
17
  ## Introduction
18
 
19
- This repository hosts the optimized versions of **Stable Diffusion XL 1.0** to accelerate inference with ONNX Runtime CUDA execution provider.
20
 
21
  The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
22
  ```
@@ -30,23 +30,10 @@ See the [usage instructions](#usage-example) for how to run the SDXL pipeline wi
30
  - **Developed by:** Stability AI
31
  - **Model type:** Diffusion-based text-to-image generative model
32
  - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
33
- - **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [SDXL refiner 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) models for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
34
 
35
  The VAE decoder is converted from [sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix). There are slight discrepancies between its output and that of the original VAE, but the decoded images should be [close enough for most purposes](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/discussions/7#64c5c0f8e2e5c94bd04eaa80).
36
 
37
- ## Performance Comparison
38
-
39
- #### Latency for 30 steps base and 9 steps refiner
40
-
41
- Below is average latency of generating an image of size 1024x1024 using NVIDIA A100-SXM4-80GB GPU:
42
-
43
- | Batch Size | PyTorch 2.1 | ONNX Runtime CUDA |
44
- |------------|----------------|-------------------|
45
- | 1 | 3779 ms | 3389 ms |
46
- | 4 | 13504 ms | 12264 ms |
47
-
48
- In this test, CUDA graph was used to speed up in both torch compile the unet and ONNX Runtime.
49
-
50
  ## Usage Example
51
 
52
  Following the [demo instructions](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md#run-demo-with-docker). Example steps:
 
12
  ---
13
 
14
 
15
+ # Stable Diffusion XL 1.0 for ONNX Runtime CUDA
16
 
17
  ## Introduction
18
 
19
+ This repository hosts the optimized onnx models of **Stable Diffusion XL Base 1.0** to accelerate inference with ONNX Runtime CUDA execution provider for NVidia GPUs. It cannot run in other execution providers like CPU or DirectML.
20
 
21
  The models are generated by [Olive](https://github.com/microsoft/Olive/tree/main/examples/stable_diffusion) with command like the following:
22
  ```
 
30
  - **Developed by:** Stability AI
31
  - **Model type:** Diffusion-based text-to-image generative model
32
  - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
33
+ - **Model Description:** This is a conversion of the [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) for [ONNX Runtime](https://github.com/microsoft/onnxruntime) inference with CUDA execution provider.
34
 
35
  The VAE decoder is converted from [sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix). There are slight discrepancies between its output and that of the original VAE, but the decoded images should be [close enough for most purposes](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/discussions/7#64c5c0f8e2e5c94bd04eaa80).
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ## Usage Example
38
 
39
  Following the [demo instructions](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md#run-demo-with-docker). Example steps: