lyraSD / README.md

yibolu

update pipeline and demos

6eca12e 8 months ago

7.8 kB

	---
	license: creativeml-openrail-m
	language:
	- en
	tags:
	- art
	- Stable Diffusion
	---
	## Model Card for lyraSD

	We consider the Diffusers as the much more extendable framework for the SD ecosystem. Therefore, we have made a pivot to Diffusers, leading to a complete update of lyraSD.

	lyraSD is currently the fastest Stable Diffusion model that can 100% align the outputs of Diffusers available, boasting an inference cost of only 0.36 seconds for a 512x512 image, accelerating the process up to 50% faster than the original version.

	Among its main features are:

	- All Commonly used SD1.5 and SDXL pipelines
	- - Text2Img
	- - Img2Img
	- - Inpainting
	- - ControlNetText2Img
	- - ControlNetImg2Img
	- - IpAdapterText2Img
	- Fast ControlNet Hot Swap: Can hot swap a ControlNet model weights within 0.6s
	- Fast LoRA Hot Swap: Can hot swap a Lora within 0.14s
	- 100% likeness to diffusers output
	- Supported Devices: Any GPU with SM version >= 80. For example, Nvidia Nvidia Ampere architecture (A2, A10, A16, A30, A40, A100), RTX 4090, 3080 and etc.

	## Speed

	### test environment

	- Device: Nvidia A100 40G
	- Nvidia driver version: 525.105.17
	- Nvidia cuda version: 12.0
	- Percision:fp16
	- Steps: 20
	- Sampler: EulerA

	### SD1.5 Text2Img Performance
	![Alt text](images/sd_txt2img.png)

	### SD1.5 ControlNet-Text2Img Performance
	![Alt text](images/sd_controlnet_txt2img.png)

	### SDXL Text2Img Performance
	![Alt text](images/sdxl_txt2img.png)

	### SDXL ControlNet-Text2Img Performance
	![Alt text](images/sdxl_controlnet_txt2img.png)

	### SD Model Load Performance
	![Alt text](images/model_load_performance.png)

	## Model Sources

	SD1.5
	- Checkpoint: https://civitai.com/models/7371/rev-animated
	- ControlNet: https://huggingface.co/lllyasviel/sd-controlnet-canny
	- Lora: https://civitai.com/models/18323?modelVersionId=46846

	SDXL
	- Checkpoint: https://civitai.com/models/43977?modelVersionId=227916
	- ControlNet: https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0
	- Lora: https://civitai.com/models/245889/dissolve-style-lora-15sdxl

	## SD1.5 Text2Img Uses

	```python
	import torch
	import time

	from lyrasd_model import LyraSdTxt2ImgPipeline

	# 存放模型文件的路径，应该包含一下结构(和diffusers一致)：
	# 1. clip 模型
	# 2. 转换好的优化后的 unet 模型，放入其中的 unet_bins 文件夹
	# 3. vae 模型
	# 4. scheduler 配置

	# LyraSD 的 C++ 编译动态链接库，其中包含 C++ CUDA 计算的细节
	lib_path = "./lyrasd_model/lyrasd_lib/libth_lyrasd_cu12_sm80.so"
	model_path = "./models/rev-animated"
	lora_path = "./models/xiaorenshu.safetensors"

	torch.classes.load_library(lib_path)

	# 构建 Txt2Img 的 Pipeline
	model = LyraSdTxt2ImgPipeline()

	model.reload_pipe(model_path)

	# load lora
	# lora model path, name，lora strength
	model.load_lora_v2(lora_path, "xiaorenshu", 0.4)

	# 准备应用的输入和超参数
	prompt = "a cat, cute, cartoon, concise, traditional, chinese painting, Tang and Song Dynasties, masterpiece, 4k, 8k, UHD, best quality"
	negative_prompt = "(((horrible))), (((scary))), (((naked))), (((large breasts))), high saturation, colorful, human:2, body:2, low quality, bad quality, lowres, out of frame, duplicate, watermark, signature, text, frames, cut, cropped, malformed limbs, extra limbs, (((missing arms))), (((missing legs)))"
	height, width = 512, 512
	steps = 20
	guidance_scale = 7
	generator = torch.Generator().manual_seed(123)
	num_images = 1

	start = time.perf_counter()
	# 推理生成
	images = model(prompt, height, width, steps,
	guidance_scale, negative_prompt, num_images,
	generator=generator)
	print("image gen cost: ",time.perf_counter() - start)
	# 存储生成的图片
	for i, image in enumerate(images):
	image.save(f"outputs/res_txt2img_lora_{i}.png")

	# unload lora， lora’s name, clear lora cache
	model.unload_lora_v2("xiaorenshu", True)
	```

	## SDXL Text2Img Uses

	```python
	import torch
	import time

	from lyrasd_model import LyraSdXLTxt2ImgPipeline

	# 存放模型文件的路径，应该包含一下结构：
	# 1. clip 模型
	# 2. 转换好的优化后的 unet 模型，放入其中的 unet_bins 文件夹
	# 3. vae 模型
	# 4. scheduler 配置

	# LyraSD 的 C++ 编译动态链接库，其中包含 C++ CUDA 计算的细节
	lib_path = "./lyrasd_model/lyrasd_lib/libth_lyrasd_cu12_sm80.so"
	model_path = "./models/helloworldSDXL20Fp16"
	lora_path = "./models/dissolve_sdxl.safetensors"

	torch.classes.load_library(lib_path)

	# 构建 Txt2Img 的 Pipeline
	model = LyraSdXLTxt2ImgPipeline()

	model.reload_pipe(model_path)

	# load lora
	# lora model path, name，lora strength
	model.load_lora_v2(lora_path, "dissolve_sdxl", 0.4)

	# 准备应用的输入和超参数
	prompt = "a cat, cute, cartoon, concise, traditional, chinese painting, Tang and Song Dynasties, masterpiece, 4k, 8k, UHD, best quality"
	negative_prompt = "(((horrible))), (((scary))), (((naked))), (((large breasts))), high saturation, colorful, human:2, body:2, low quality, bad quality, lowres, out of frame, duplicate, watermark, signature, text, frames, cut, cropped, malformed limbs, extra limbs, (((missing arms))), (((missing legs)))"
	height, width = 512, 512
	steps = 20
	guidance_scale = 7
	generator = torch.Generator().manual_seed(123)
	num_images = 1

	start = time.perf_counter()
	# 推理生成
	images = model( prompt,
	height=height,
	width=width,
	num_inference_steps=steps,
	num_images_per_prompt=1,
	guidance_scale=guidance_scale,
	negative_prompt=negative_prompt,
	generator=generator
	)
	print("image gen cost: ",time.perf_counter() - start)
	# 存储生成的图片
	for i, image in enumerate(images):
	image.save(f"outputs/res_txt2img_xl_lora_{i}.png")

	# unload lora，参数为 lora 的名字，是否清除 lora 缓存
	model.unload_lora_v2("dissolve_sdxl", True)
	```

	## Demo output

	### Text2Img
	#### SD1.5 Text2Img
	![text2img_demo](./outputs/res_txt2img_0.png)

	#### SD1.5 Text2Img with Lora
	![text2img_demo](./outputs/res_txt2img_lora_0.png)

	#### SDXL Text2Img
	![text2img_demo](./outputs/res_sdxl_txt2img_0.png)

	#### SDXL Text2Img with Lora
	![text2img_demo](./outputs/res_txt2img_xl_lora_0.png)


	<!-- ### Img2Img

	#### Img2Img input
	<img src="https://chuangxin-research-1258344705.cos.ap-guangzhou.myqcloud.com/share/files/seaside_town.png?q-sign-algorithm=sha1&q-ak=AKIDBF6i7GCtKWS8ZkgOtACzX3MQDl37xYty&q-sign-time=1692601590;1865401590&q-key-time=1692601590;1865401590&q-header-list=&q-url-param-list=&q-signature=ca04ca92d990d94813029c0d9ef29537e5f4637c" alt="img2img input" width="512"/>

	#### Img2Img output
	![text2img_demo](./outputs/res_img2img_0.png) -->

	### ControlNet Text2Img

	#### Control Image
	![text2img_demo](./control_bird_canny.png)

	#### SD1.5 ControlNet Text2Img Output
	![text2img_demo](./outputs/res_controlnet_txt2img_0.png)

	#### SDXL ControlNet Text2Img Output
	![text2img_demo](./outputs/res_controlnet_sdxl_txt2img_0.png)


	## Docker Environment Recommendation

	- For Cuda 12.X: we recommend ```yibolu96/lyrasd_workspace:1.0.0```

	```bash
	docker pull yibolu96/lyrasd_workspace:1.0.0
	docker run --rm -it --gpus all -v ./:/lyraSD yibolu96/lyrasd_workspace:1.0.0

	pip install -r requirements.txt
	python txt2img_demo.py
	```

	## Citation
	``` bibtex
	@Misc{lyraSD_2024,
	author = {Kangjian Wu, Zhengtao Wang, Yibo Lu, Haoxiong Su, Sa Xiao, Bin Wu},
	title = {lyraSD: Accelerating Stable Diffusion with best flexibility},
	howpublished = {\url{https://huggingface.co/TMElyralab/lyraSD}},
	year = {2024}
	}
	```

	## Report bug
	- start a discussion to report any bugs!--> https://huggingface.co/TMElyralab/lyraSD/discussions
	- report bug with a `[bug]` mark in the title.