svjack commited on
Commit
34238e4
1 Parent(s): 4bdf412

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +165 -0
README.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: zh
3
+ license: other
4
+ tags:
5
+ - stable-diffusion
6
+ - stable-diffusion-diffusers
7
+ - text-to-image
8
+ - zh
9
+ - Chinese
10
+ inference: false
11
+ extra_gated_prompt: |-
12
+ One more step before getting this model.
13
+ This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
14
+ The CreativeML OpenRAIL License specifies:
15
+
16
+ 1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content
17
+ 2. rinna Co., Ltd. claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
18
+ 3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
19
+ Please read the full license here: https://huggingface.co/spaces/CompVis/stable-diffusion-license
20
+
21
+ By clicking on "Access repository" below, you accept that your *contact information* (email address and username) can be shared with the model authors as well.
22
+
23
+ extra_gated_fields:
24
+ I have read the License and agree with its terms: checkbox
25
+ ---
26
+
27
+
28
+ # Chinese Stable Diffusion Model Card
29
+
30
+ <!--
31
+ ![rinna](https://github.com/rinnakk/japanese-clip/blob/master/data/rinna.png?raw=true)
32
+ -->
33
+
34
+ svjack/Stable-Diffusion-FineTuned-zh-v0 is a Chinese-specific latent text-to-image diffusion model capable of generating images given any Chinese text input.
35
+
36
+ This model was trained by using a powerful text-to-image model, [diffusers](https://github.com/huggingface/diffusers)
37
+ For more information about our training method, see [train_zh_model.py](https://github.com/svjack/Stable-Diffusion-Chinese-Extend/blob/main/train_zh_model.py).
38
+ With the help of a good baseline model [Taiyi-Stable-Diffusion-1B-Chinese-v0.1](IDEA-CCNL/Taiyi-Stable-Diffusion-1B-Chinese-v0.1) from [IDEA-CCNL](https://github.com/IDEA-CCNL/Fengshenbang-LM)
39
+
40
+ <!--
41
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/rinnakk/japanese-stable-diffusion/blob/master/scripts/txt2img.ipynb)
42
+ -->
43
+
44
+ ## Model Details
45
+ - **Developed by:** Zhipeng Yang
46
+ - **Model type:** Diffusion-based text-to-image generation model
47
+ - **Language(s):** Chinese
48
+ - **License:** [The CreativeML OpenRAIL M license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) is an [Open RAIL M license](https://www.licenses.ai/blog/2022/8/18/naming-convention-of-responsible-ai-licenses), adapted from the work that [BigScience](https://bigscience.huggingface.co/) and [the RAIL Initiative](https://www.licenses.ai/) are jointly carrying in the area of responsible AI licensing. See also [the article about the BLOOM Open RAIL license](https://bigscience.huggingface.co/blog/the-bigscience-rail-license) on which our license is based.
49
+ - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model (LDM)](https://arxiv.org/abs/2112.10752) that used [Stable Diffusion](https://github.com/CompVis/stable-diffusion) as a pre-trained model.
50
+ - **Resources for more information:** [https://github.com/svjack/Stable-Diffusion-Chinese-Extend](https://github.com/svjack/Stable-Diffusion-Chinese-Extend)
51
+
52
+ ## Examples
53
+
54
+ Firstly, install our package as follows. This package is modified [🤗's Diffusers library](https://github.com/huggingface/diffusers) to run Chinese Stable Diffusion.
55
+
56
+
57
+ ```bash
58
+ diffusers==0.6.0
59
+ transformers
60
+ torch
61
+ datasets
62
+ accelerate
63
+ sentencepiece
64
+ ```
65
+
66
+ Run this command to log in with your HF Hub token if you haven't before:
67
+
68
+ ```bash
69
+ huggingface-cli login
70
+ ```
71
+
72
+ Running the pipeline with the LMSDiscreteScheduler scheduler:
73
+
74
+ ```python
75
+ from diffusers import StableDiffusionPipeline
76
+ pipeline = StableDiffusionPipeline.from_pretrained("svjack/Stable-Diffusion-FineTuned-zh-v1")
77
+ pipeline.safety_checker = lambda images, clip_input: (images, False)
78
+ pipeline = pipeline.to("cuda")
79
+
80
+ prompt = '女孩们打开了另一世界的大门'
81
+ image = pipeline(prompt, guidance_scale=7.5).images[0]
82
+ ```
83
+ ### Generator Results comparison
84
+ [https://github.com/svjack/Stable-Diffusion-Chinese-Extend](https://github.com/svjack/Stable-Diffusion-Chinese-Extend)
85
+
86
+ ![0](https://github.com/svjack/Stable-Diffusion-Chinese-Extend/blob/main/imgs/dragon_v0.jpg?raw=true)
87
+ ![1](https://github.com/svjack/Stable-Diffusion-Chinese-Extend/blob/main/imgs/dragon_style_v0.jpg?raw=true)
88
+ ![2](https://github.com/svjack/Stable-Diffusion-Chinese-Extend/blob/main/imgs/girl_v0.jpg?raw=true)
89
+ ![3](https://github.com/svjack/Stable-Diffusion-Chinese-Extend/blob/main/imgs/girl_style_v0.jpg?raw=true)
90
+
91
+ <!--
92
+ _Note: `JapaneseStableDiffusionPipeline` is almost same as diffusers' `StableDiffusionPipeline` but added some lines to initialize our models properly._
93
+
94
+
95
+ ## Misuse, Malicious Use, and Out-of-Scope Use
96
+ _Note: This section is taken from the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini), but applies in the same way to Stable Diffusion v1._
97
+
98
+
99
+ The model should not be used to intentionally create or disseminate images that create hostile or alienating environments for people. This includes generating images that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
100
+
101
+ ### Out-of-Scope Use
102
+ The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
103
+
104
+ ### Misuse and Malicious Use
105
+ Using the model to generate content that is cruel to individuals is a misuse of this model. This includes, but is not limited to:
106
+
107
+ - Generating demeaning, dehumanizing, or otherwise harmful representations of people or their environments, cultures, religions, etc.
108
+ - Intentionally promoting or propagating discriminatory content or harmful stereotypes.
109
+ - Impersonating individuals without their consent.
110
+ - Sexual content without consent of the people who might see it.
111
+ - Mis- and disinformation
112
+ - Representations of egregious violence and gore
113
+ - Sharing of copyrighted or licensed material in violation of its terms of use.
114
+ - Sharing content that is an alteration of copyrighted or licensed material in violation of its terms of use.
115
+
116
+ ## Limitations and Bias
117
+
118
+ ### Limitations
119
+
120
+ - The model does not achieve perfect photorealism
121
+ - The model cannot render legible text
122
+ - The model does not perform well on more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
123
+ - Faces and people in general may not be generated properly.
124
+ - The model was trained mainly with Japanese captions and will not work as well in other languages.
125
+ - The autoencoding part of the model is lossy
126
+ - The model was trained on a subset of a large-scale dataset
127
+ [LAION-5B](https://laion.ai/blog/laion-5b/) which contains adult material
128
+ and is not fit for product use without additional safety mechanisms and
129
+ considerations.
130
+ - No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data.
131
+ The training data can be searched at [https://rom1504.github.io/clip-retrieval/](https://rom1504.github.io/clip-retrieval/) to possibly assist in the detection of memorized images.
132
+
133
+ ### Bias
134
+
135
+ While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
136
+ Japanese Stable Diffusion was trained on Japanese datasets including [LAION-5B](https://laion.ai/blog/laion-5b/) with Japanese captions,
137
+ which consists of images that are primarily limited to Japanese descriptions.
138
+ Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for.
139
+ This affects the overall output of the model.
140
+ Further, the ability of the model to generate content with non-Japanese prompts is significantly worse than with Japanese-language prompts.
141
+
142
+ ### Safety Module
143
+
144
+ The intended use of this model is with the [Safety Checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) in Diffusers.
145
+ This checker works by checking model outputs against known hard-coded NSFW concepts.
146
+ The concepts are intentionally hidden to reduce the likelihood of reverse-engineering this filter.
147
+ Specifically, the checker compares the class probability of harmful concepts in the embedding space of the `CLIPTextModel` *after generation* of the images.
148
+ The concepts are passed into the model with the generated image and compared to a hand-engineered weight for each NSFW concept.
149
+
150
+
151
+ ## Training
152
+
153
+ **Training Data**
154
+ We used the following dataset for training the model:
155
+
156
+ - Approximately 100 million images with Japanese captions, including the Japanese subset of [LAION-5B](https://laion.ai/blog/laion-5b/).
157
+
158
+ **Training Procedure**
159
+ Japanese Stable Diffusion has the same architecture as Stable Diffusion and was trained by using Stable Diffusion. Because Stable Diffusion was trained on English dataset and the CLIP tokenizer is basically for English, we had 2 stages to transfer to a language-specific model, inspired by [PITI](https://arxiv.org/abs/2205.12952).
160
+
161
+ 1. Train a Japanese-specific text encoder with our Japanese tokenizer from scratch with the latent diffusion model fixed. This stage is expected to map Japanese captions to Stable Diffusion's latent space.
162
+ 2. Fine-tune the text encoder and the latent diffusion model jointly. This stage is expected to generate Japanese-style images more.
163
+
164
+ [//]: # (_Note: Japanese Stable Diffusion is still running and this checkpoint is the current best one. We might update to a better checkpoint via this repository._)
165
+ -->