alibaba-pai
/

pai-diffusion-food-large-zh

StableDiffusionPipeline

Inference Endpoints

Model card Files Files and versions Community

pai-diffusion-food-large-zh / README.md

Artiprocher's picture

add model

2b7b2cd about 2 years ago

|

1.61 kB

	---
	license: apache-2.0
	tags:
	- pytorch
	- diffusers
	- text-to-image
	---

	# Chinese Latent Diffusion Model

	我们开源了一个中文 Lattent Diffusion 模型，为中文古诗词生成精美配图

	* Github: [EasyNLP](https://github.com/alibaba/EasyNLP)

	## 模型介绍

	模型分成三部分：

	* Text Encoder：把中文文本输入转化成 Embedding 向量
	* Latent Diffusion Model：在 Latent 空间中根据文本输入处理随机生成的噪声
	* Autoencoder：将 Latent 空间中的张量还原为图片
	* Super Resolution：提升图片分辨率

	我们使用中文模型 [CLIP-ViT-L](https://wukong-dataset.github.io/wukong-dataset/benchmark.html) 作为 Text Encoder，使用 [latent-diffusion](https://github.com/CompVis/latent-diffusion) 中的 Autoencoder，使用 [ESRGAN](https://github.com/xinntao/ESRGAN) 作为 Super Resolution 模型。我们使用 [Noah-Wukong](https://wukong-dataset.github.io/wukong-dataset/) 数据集中的两千万图文对 Latent Diffusion Model 进行了预训练。

	我们在私有美食数据集上进行了微调，以生成精美的美食图片。

	## 使用

	基于 Diffusers 开发，请先安装 Diffusers

	```
	pip install diffusers
	```

	```python
	from LdmZhPipeline import LDMZhTextToImagePipeline

	generator = LDMZhTextToImagePipeline.from_pretrained("alibaba-pai/pai-diffusion-food-large-zh")
	generator.to("cuda")
	image = generator("番茄炒蛋").images[0]
	image.save("food.png")
	```

	超分辨率模块默认是关闭的，如需启用，请添加参数 `use_sr=True`。

	```python
	image = generator("番茄炒蛋", use_sr=True).images[0]
	```