Stable Diffusion 1.5 Latent Consistency Model for RKNN2
(English README see below)
使用RKNPU2运行Stable Diffusion 1.5 LCM 图像生成模型!!
推理速度(RK3588, 单NPU核):
- 384x384: 文本编码器 0.05s + U-Net 2.36s/it + VAE Decoder 5.48s
- 512x512: 文本编码器 0.05s + U-Net 5.65s/it + VAE Decoder 11.13s
内存占用:
- 384x384: 约5.2GB
- 512x512: 约5.6GB
使用方法
1. 克隆或者下载此仓库到本地.
2. 安装依赖
pip install diffusers pillow numpy<2 rknn-toolkit-lite2
3. 运行
python ./run_rknn-lcm.py -i ./model -o ./images --num-inference-steps 4 -s 512x512 --prompt "Majestic mountain landscape with snow-capped peaks, autumn foliage in vibrant reds and oranges, a turquoise river winding through a valley, crisp and serene atmosphere, ultra-realistic style."
模型转换
安装依赖
pip install diffusers pillow numpy<2 rknn-toolkit2
1. 下载模型
下载一个onnx格式的Stable Diffusion 1.5 LCM模型,并放到./model
目录下。
huggingface-cli download TheyCallMeHex/LCM-Dreamshaper-V7-ONNX
cp -r -L ~/.cache/huggingface/hub/models--TheyCallMeHex--LCM-Dreamshaper-V7-ONNX/snapshots/4029a217f9cdc0437f395738d3ab686bb910ceea ./model
理论上你也可以通过将LCM Lora合并到普通的Stable Diffusion 1.5模型,然后转换为onnx格式,来实现LCM的推理。但是我这边也不知道怎么做,有知道的小伙伴可以提个PR。
2. 转换模型
# 转换模型, 384x384分辨率
python ./convert-onnx-to-rknn.py -m ./model -r 384x384
注意分辨率越高,模型越大,转换时间越长。不建议使用太大的分辨率。
已知问题
截至目前,使用最新版本的rknn-toolkit2 2.2.0版本转换的模型仍然存在极其严重的精度损失!即使使用的是fp16数据类型。如图,上方是使用onnx模型推理的结果,下方是使用rknn模型推理的结果。所有参数均一致。并且分辨率越高,精度损失越严重。这是rknn-toolkit2的bug。(v2.3.0已修复)其实模型转换脚本可以选择多个分辨率(例如"384x384,256x256"), 但这会导致模型转换失败。这是rknn-toolkit2的bug。
参考
- TheyCallMeHex/LCM-Dreamshaper-V7-ONNX
- Optimum's LatentConsistencyPipeline
- happyme531/RK3588-stable-diffusion-GPU
English README
Stable Diffusion 1.5 Latent Consistency Model for RKNN2
Run the Stable Diffusion 1.5 LCM image generation model using RKNPU2!
- Inference speed (RK3588, single NPU core):
- 384x384: Text encoder 0.05s + U-Net 2.36s/it + VAE Decoder 5.48s
- 512x512: Text encoder 0.05s + U-Net 5.65s/it + VAE Decoder 11.13s
- Memory usage:
- 384x384: About 5.2GB
- 512x512: About 5.6GB
Usage
1. Clone or download this repository to your local machine
2. Install dependencies
pip install diffusers pillow numpy<2 rknn-toolkit-lite2
3. Run
python ./run_rknn-lcm.py -i ./model -o ./images --num-inference-steps 4 -s 512x512 --prompt "Majestic mountain landscape with snow-capped peaks, autumn foliage in vibrant reds and oranges, a turquoise river winding through a valley, crisp and serene atmosphere, ultra-realistic style."
Model Conversion
Install dependencies
pip install diffusers pillow numpy<2 rknn-toolkit2
1. Download the model
Download a Stable Diffusion 1.5 LCM model in ONNX format and place it in the ./model
directory.
huggingface-cli download TheyCallMeHex/LCM-Dreamshaper-V7-ONNX
cp -r -L ~/.cache/huggingface/hub/models--TheyCallMeHex--LCM-Dreamshaper-V7-ONNX/snapshots/4029a217f9cdc0437f395738d3ab686bb910ceea ./model
In theory, you could also achieve LCM inference by merging the LCM Lora into a regular Stable Diffusion 1.5 model and then converting it to ONNX format. However, I'm not sure how to do this. If anyone knows, please feel free to submit a PR.
2. Convert the model
# Convert the model, 384x384 resolution
python ./convert-onnx-to-rknn.py -m ./model -r 384x384
Note that the higher the resolution, the larger the model and the longer the conversion time. It's not recommended to use very high resolutions.
Known Issues
As of now, models converted using the latest version of rknn-toolkit2 (version 2.2.0) still suffer from severe precision loss, even when using fp16 data type. As shown in the image, the top is the result of inference using the ONNX model, and the bottom is the result using the RKNN model. All parameters are the same. Moreover, the higher the resolution, the more severe the precision loss. This is a bug in rknn-toolkit2.(Fixed in v2.3.0)Actually, the model conversion script can select multiple resolutions (e.g., "384x384,256x256"), but this causes the model conversion to fail. This is a bug in rknn-toolkit2.
References
Model tree for happyme531/Stable-Diffusion-1.5-LCM-ONNX-RKNN2
Base model
TheyCallMeHex/LCM-Dreamshaper-V7-ONNX