votepurchase Linaqruf commited on
Commit
9f434e4
·
verified ·
0 Parent(s):

Duplicate from cagliostrolab/animagine-xl-4.0

Browse files

Co-authored-by: Furqanil Taqwa <Linaqruf@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,291 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - text-to-image
6
+ - stable-diffusion
7
+ - safetensors
8
+ - stable-diffusion-xl
9
+ widget:
10
+ - text: >-
11
+ 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors,
12
+ night, turtleneck, masterpiece, high score, great score, absurdres
13
+ parameter:
14
+ negative_prompt: >-
15
+ lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit,
16
+ fewer digits, cropped, worst quality, low quality, low score, bad score,
17
+ average score, signature, watermark, username, blurry
18
+ example_title: 1girl
19
+ - text: >-
20
+ 1boy, male focus, green hair, sweater, looking at viewer, upper body,
21
+ beanie, outdoors, night, turtleneck, masterpiece, high score, great score,
22
+ absurdres
23
+ parameter:
24
+ negative_prompt: >-
25
+ lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit,
26
+ fewer digits, cropped, worst quality, low quality, low score, bad score,
27
+ average score, signature, watermark, username, blurry
28
+ example_title: 1boy
29
+ license: openrail++
30
+ ---
31
+
32
+ # Animagine XL 4.0
33
+
34
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/_tsxjwf3VPu94xh9wJSbo.png)
35
+
36
+ ## Overview
37
+
38
+ **Animagine XL 4.0**, also stylized as **Anim4gine**, is the ultimate anime-themed finetuned SDXL model and the latest installment of [Animagine XL series](https://huggingface.co/collections/Linaqruf/animagine-xl-669888c0add5adaf09754aca). Despite being a continuation, the model was retrained from [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) with a massive dataset of 8.4M diverse anime-style images from various sources with the knowledge cut-off of January 7th 2025 and finetuned for approximately 2650 GPU hours. Similar to the previous version, this model was trained using tag ordering method for the identity and style training.
39
+
40
+ ## Model Details
41
+
42
+ - **Developed by**: [Cagliostro Research Lab](https://github.com/cagliostrolab)
43
+ - **Model type**: Diffusion-based text-to-image generative model
44
+ - **License**: [CreativeML Open RAIL++-M](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md)
45
+ - **Model Description**: This is a model that can be used to generate and modify specifically anime-themed images based on text prompt
46
+ - **Fine-tuned from**: [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
47
+
48
+ ## Downstream Use
49
+
50
+ 1. Use this model in our [`Hugging Face Spaces`](https://huggingface.co/spaces/cagliostrolab/animagine-xl-4.0)
51
+ 2. Use it in [`ComfyUI`](https://github.com/comfyanonymous/ComfyUI) or [`Stable Diffusion Webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
52
+ 3. Use it with 🧨 `diffusers`
53
+
54
+ ## 🧨 Diffusers Installation
55
+
56
+ ### 1. Install Required Libraries
57
+
58
+ ```bash
59
+ pip install diffusers transformers accelerate safetensors --upgrade
60
+ ```
61
+
62
+ ### 2. Example Code
63
+ The example below uses `lpw_stable_diffusion_xl` pipeline which enables better handling of long, weighted and detailed prompts. The model is already uploaded in FP16 format, so there's no need to specify `variant="fp16"` in the `from_pretrained` call.
64
+
65
+ ```python
66
+ import torch
67
+ from diffusers import StableDiffusionXLPipeline
68
+
69
+ pipe = StableDiffusionXLPipeline.from_pretrained(
70
+ "cagliostrolab/animagine-xl-4.0",
71
+ torch_dtype=torch.float16,
72
+ use_safetensors=True,
73
+ custom_pipeline="lpw_stable_diffusion_xl",
74
+ add_watermarker=False
75
+ )
76
+ pipe.to('cuda')
77
+
78
+ prompt = "1girl, arima kana, oshi no ko, hoshimachi suisei, hoshimachi suisei \(1st costume\), cosplay, looking at viewer, smile, outdoors, night, v, masterpiece, high score, great score, absurdres"
79
+ negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry"
80
+
81
+ image = pipe(
82
+ prompt,
83
+ negative_prompt=negative_prompt,
84
+ width=832,
85
+ height=1216,
86
+ guidance_scale=6,
87
+ num_inference_steps=25
88
+ ).images[0]
89
+
90
+ image.save("./arima_kana.png")
91
+ ```
92
+
93
+ ## Usage Guidelines
94
+
95
+ ### 1. Prompt Structure
96
+ The model was trained with tag-based captions and the tag-ordering method. Use this structured template:
97
+
98
+ ```
99
+ 1girl/1boy/1other, character name, from which series, everything else in any order.
100
+ ```
101
+
102
+ ### 2. Quality Enhancement Tags
103
+ Add these tags at the start or end of your prompt:
104
+
105
+ ```
106
+ masterpiece, high score, great score, absurdres
107
+ ```
108
+
109
+ ### 3. Recommended Negative Prompt
110
+ ```
111
+ lowres, bad anatomy, bad hands, text, error, missing finger, extra digits, fewer digits, cropped, worst quality, low quality, low score, bad score, average score, signature, watermark, username, blurry
112
+ ```
113
+
114
+ ### 4. Optimal Settings
115
+ - **CFG Scale**: 5-7 (6 Recommended)
116
+ - **Sampling Steps**: 25-28 (25 Recommended)
117
+ - **Preferred Sampler**: Euler Ancestral (Euler a)
118
+
119
+ ### 5. Recommended Resolutions
120
+
121
+ | Orientation | Dimensions | Aspect Ratio |
122
+ |------------|------------|--------------|
123
+ | Square | 1024 x 1024| 1:1 |
124
+ | Landscape | 1152 x 896 | 9:7 |
125
+ | | 1216 x 832 | 3:2 |
126
+ | | 1344 x 768 | 7:4 |
127
+ | | 1536 x 640 | 12:5 |
128
+ | Portrait | 896 x 1152 | 7:9 |
129
+ | | 832 x 1216 | 2:3 |
130
+ | | 768 x 1344 | 4:7 |
131
+ | | 640 x 1536 | 5:12 |
132
+
133
+ ### 6. Final Prompt Structure Example
134
+ ```
135
+ masterpiece, high score, great score, absurdres, 1girl, firefly \(honkai: star rail\), honkai \(series\), honkai: star rail, casual, solo, looking at viewer, outdoors, smile, reaching towards viewer, night
136
+ ```
137
+
138
+ ## Special Tags
139
+
140
+ The model supports various special tags that can be used to control different aspects of the image generation process. These tags are carefully weighted and tested to provide consistent results across different prompts.
141
+
142
+ ### Quality Tags
143
+ Quality tags are fundamental controls that directly influence the overall image quality and detail level. Available quality tags:
144
+ - `masterpiece`
145
+ - `best quality`
146
+ - `low quality`
147
+ - `worst quality`
148
+
149
+ | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/bDdKraYxjiReKknlYJepR.png" width="100%" style="max-height: 400px; object-fit: contain;"> | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/mAgMMKL2tBj8oBuWHTYUz.png" width="100%" style="max-height: 400px; object-fit: contain;"> |
150
+ |---|---|
151
+ | Sample image using `"masterpiece, best quality"` quality tags with negative prompt left empty. | Sample image using `"low quality, worst quality"` quality tags with negative prompt left empty. |
152
+
153
+ ### Score Tags
154
+ Score tags provide a more nuanced control over image quality compared to basic quality tags. They have a stronger impact on steering output quality in this model. Available score tags:
155
+ - `high score`
156
+ - `great score`
157
+ - `good score`
158
+ - `average score`
159
+ - `bad score`
160
+ - `low score`
161
+
162
+ | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/PXK6D1yhD8SND-VHFQOXD.png" width="100%" style="max-height: 400px; object-fit: contain;"> | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/0uUw7DQ9IMiSNE_MZ9Uyf.png" width="100%" style="max-height: 400px; object-fit: contain;"> |
163
+ |---|---|
164
+ | Sample image using `"high score, great score"` score tags with negative prompt left empty. | Sample image using `"bad score, low score"` score tags with negative prompt left empty. |
165
+
166
+ ### Temporal Tags
167
+ Temporal tags allow you to influence the artistic style based on specific time periods or years. This can be useful for generating images with era-specific artistic characteristics. Supported year tags:
168
+ - `year 2005`
169
+ - `year {n}`
170
+ - `year 2025`
171
+
172
+ | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/weRv0BmfkZrBhcW5NxXAI.png" width="100%" style="max-height: 400px; object-fit: contain;"> | <img src="https://cdn-uploads.huggingface.co/production/uploads/6365c8dbf31ef76df4042821/WwFoeLrbN2MkXuGHh91Ky.png" width="100%" style="max-height: 400px; object-fit: contain;"> |
173
+ |---|---|
174
+ | Sample image of Hatsune Miku with `"year 2007"` temporal tag. | Sample image of Hatsune Miku with `"year 2023"` temporal tag. |
175
+
176
+ ### Rating Tags
177
+ Rating tags help control the content safety level of generated images. These tags should be used responsibly and in accordance with applicable laws and platform policies. Supported ratings:
178
+ - `safe`
179
+ - `sensitive`
180
+ - `nsfw`
181
+ - `explicit`
182
+
183
+ ## Training Information
184
+
185
+ The model was trained using state-of-the-art hardware and optimized hyperparameters to ensure the highest quality output. Below are the detailed technical specifications and parameters used during the training process:
186
+
187
+ | Parameter | Value |
188
+ |-----------|--------|
189
+ | Hardware | 7 x H100 80GB SXM5 |
190
+ | Num Images | 8,401,464 |
191
+ | UNet Learning Rate | 2.5e-6 |
192
+ | Text Encoder Learning Rate | 1.25e-6 |
193
+ | Scheduler | Constant With Warmup |
194
+ | Warmup Steps | 5% |
195
+ | Batch Size | 32 |
196
+ | Gradient Accumulation Steps | 2 |
197
+ | Training Resolution | 1024x1024 |
198
+ | Optimizer | Adafactor |
199
+ | Input Perturbation Noise | 0.1 |
200
+ | Debiased Estimation Loss | Enabled |
201
+ | Mixed Precision | fp16 |
202
+
203
+ ## Acknowledgement
204
+
205
+ This long-term project would not have been possible without the groundbreaking work, innovative contributions, and comprehensive documentation provided by **Stability AI**, **Novel AI**, and **Waifu Diffusion Team**. We are especially grateful for the kickstarter grant from **Main** that enabled us to progress beyond V2. For this iteration, we would like to express our sincere gratitude to everyone in the community for their continuous support, particularly:
206
+
207
+ 1. [**Moescape AI**](https://moescape.ai/): Our invaluable collaboration partner in model distribution and testing
208
+ 2. **Lesser Rabbit**: For providing essential computing and research grants
209
+ 3. [**Kohya SS**](https://github.com/kohya-ss): For developing the comprehensive open-source training framework
210
+ 4. [**discus0434**](https://github.com/discus0434): For creating the industry-leading open-source Aesthetic Predictor 2.5
211
+ 5. **Early testers**: For their dedication in providing critical feedback and thorough quality assurance
212
+
213
+ ## Contributors
214
+
215
+ We extend our heartfelt appreciation to our dedicated team members who have contributed significantly to this project, including but not limited to:
216
+
217
+ ### Model
218
+ - [**KayfaHaarukku**](https://huggingface.co/kayfahaarukku)
219
+ - [**Raelina**](https://huggingface.co/Raelina)
220
+ - [**Linaqruf**](https://huggingface.co/Linaqruf)
221
+
222
+ ### Gradio
223
+ - [**Damar Jati**](https://huggingface.co/DamarJati)
224
+
225
+ ### Relations, finance, and quality assurance
226
+ - [**Scipius**](https://huggingface.co/Scipius2121)
227
+ - [**Asahina**](https://huggingface.co/Asahina2K)
228
+ - [**Bell**](https://huggingface.co/ItsMeBell)
229
+ - [**BoboiAzumi**](https://huggingface.co/Boboiazumi)
230
+
231
+ ### Data
232
+ - [**Pomegranata**](https://huggingface.co/paripi)
233
+ - [**Kr1SsSzz**](https://huggingface.co/Kr1SsSzz)
234
+ - [**Fiqi**](https://huggingface.co/saikanov)
235
+ - [**William Adams Soeherman**](https://huggingface.co/williamsoeherman)
236
+
237
+ ## Fundraising Are Now Open Again!
238
+
239
+ We’re excited to reopen Fundraising to fund new training, research, and model development. Your support helps us push the boundaries of what’s possible with AI.
240
+
241
+ **You can help us with:**
242
+
243
+ * **Donate**: Contribute via ETH or USDT to the address below.
244
+
245
+ * **Share**: Spread the word about our models and share your creations!
246
+
247
+ * **Feedback**: Let us know how we can improve.
248
+
249
+ **Donation Address**:
250
+
251
+ ETH/USDT/USDC(e): ```0xd8A1dA94BA7E6feCe8CfEacc1327f498fCcBFC0C```
252
+
253
+
254
+ <details>
255
+ <summary>Why do we use Cryptocurrency?</summary>
256
+ When we initially opened fundraising through Ko-fi and using PayPal as withdrawal methods, our PayPal account was flagged and eventually banned, despite our efforts to explain the purpose of our project. Unfortunately, this forced us to refund all donations and left us without a reliable way to receive support. To avoid such issues and ensure transparency, we have now switched to cryptocurrency as the way to raise the fund.
257
+ </details>
258
+
259
+ <details>
260
+ <summary>Want to Donate in Non-Crypto Currency?</summary>
261
+ Although we had a bad experience with Paypal, and you’d like to support us but prefer not to use cryptocurrency, feel free to contact us via [Discord Server](https://discord.gg/cqh9tZgbGc) for alternative donation methods.
262
+ </details>
263
+
264
+ ## Join Our Discord Server
265
+ Feel free to join our discord server
266
+ <div style="text-align: center;">
267
+ <a href="https://discord.gg/cqh9tZgbGc">
268
+ <img src="https://discord.com/api/guilds/1115542847395987519/widget.png?style=banner2" alt="Discord Banner 2"/>
269
+ </a>
270
+ </div>
271
+
272
+ ## Limitations
273
+
274
+ - **Prompt Format**: Limited to tag-based text prompts; natural language input may not be effective
275
+ - **Anatomy**: May struggle with complex anatomical details, particularly hand poses and finger counting
276
+ - **Text Generation**: Text rendering in images is currently not supported and not recommended
277
+ - **New Characters**: Recent characters may have lower accuracy due to limited training data availability
278
+ - **Multiple Characters**: Scenes with multiple characters may require careful prompt engineering
279
+ - **Resolution**: Higher resolutions (e.g., 1536x1536) may show degradation as training used original SDXL resolution
280
+ - **Style Consistency**: May require specific style tags as training focused more on identity preservation than style consistency
281
+
282
+ ## License
283
+
284
+ This model adopts the original [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md) from Stability AI without any modifications or additional restrictions. The license terms remain exactly as specified in the original SDXL license, which includes:
285
+
286
+ - ✅ **Permitted**: Commercial use, modifications, distributions, private use
287
+ - ❌ **Prohibited**: Illegal activities, harmful content generation, discrimination, exploitation
288
+ - ⚠️ **Requirements**: Include license copy, state changes, preserve notices
289
+ - 📝 **Warranty**: Provided "AS IS" without warranties
290
+
291
+ Please refer to the [original SDXL license](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/LICENSE.md) for the complete and authoritative terms and conditions.
animagine-xl-4.0.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d5b43ff75b6ab598502d4c779d2fbfa3dceca51c60c3b609640a60772333916
3
+ size 6938434056
model_index.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "StableDiffusionXLPipeline",
3
+ "_diffusers_version": "0.32.2",
4
+ "feature_extractor": [
5
+ null,
6
+ null
7
+ ],
8
+ "force_zeros_for_empty_prompt": true,
9
+ "image_encoder": [
10
+ null,
11
+ null
12
+ ],
13
+ "scheduler": [
14
+ "diffusers",
15
+ "EulerAncestralDiscreteScheduler"
16
+ ],
17
+ "text_encoder": [
18
+ "transformers",
19
+ "CLIPTextModel"
20
+ ],
21
+ "text_encoder_2": [
22
+ "transformers",
23
+ "CLIPTextModelWithProjection"
24
+ ],
25
+ "tokenizer": [
26
+ "transformers",
27
+ "CLIPTokenizer"
28
+ ],
29
+ "tokenizer_2": [
30
+ "transformers",
31
+ "CLIPTokenizer"
32
+ ],
33
+ "unet": [
34
+ "diffusers",
35
+ "UNet2DConditionModel"
36
+ ],
37
+ "vae": [
38
+ "diffusers",
39
+ "AutoencoderKL"
40
+ ]
41
+ }
scheduler/scheduler_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "EulerAncestralDiscreteScheduler",
3
+ "_diffusers_version": "0.32.2",
4
+ "beta_end": 0.012,
5
+ "beta_schedule": "scaled_linear",
6
+ "beta_start": 0.00085,
7
+ "clip_sample": false,
8
+ "interpolation_type": "linear",
9
+ "num_train_timesteps": 1000,
10
+ "prediction_type": "epsilon",
11
+ "rescale_betas_zero_snr": false,
12
+ "sample_max_value": 1.0,
13
+ "set_alpha_to_one": false,
14
+ "skip_prk_steps": true,
15
+ "steps_offset": 1,
16
+ "timestep_spacing": "leading",
17
+ "trained_betas": null,
18
+ "use_karras_sigmas": false
19
+ }
text_encoder/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPTextModel"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 0,
7
+ "dropout": 0.0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "quick_gelu",
10
+ "hidden_size": 768,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 77,
16
+ "model_type": "clip_text_model",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "projection_dim": 768,
21
+ "torch_dtype": "float16",
22
+ "transformers_version": "4.48.1",
23
+ "vocab_size": 49408
24
+ }
text_encoder/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73528dedc471ea62c5d24bab6621e2ef6890aa136ca080c53f2d205c39ab18df
3
+ size 246144152
text_encoder_2/config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CLIPTextModelWithProjection"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 0,
7
+ "dropout": 0.0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_size": 1280,
11
+ "initializer_factor": 1.0,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 5120,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 77,
16
+ "model_type": "clip_text_model",
17
+ "num_attention_heads": 20,
18
+ "num_hidden_layers": 32,
19
+ "pad_token_id": 1,
20
+ "projection_dim": 1280,
21
+ "torch_dtype": "float16",
22
+ "transformers_version": "4.48.1",
23
+ "vocab_size": 49408
24
+ }
text_encoder_2/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1669a15b906d7a8cc7e919fa44769750e612fe580e3d8ceed60acbc212aea1ca
3
+ size 1389382176
tokenizer/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "<|endoftext|>",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer/tokenizer_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "49406": {
5
+ "content": "<|startoftext|>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49407": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ }
20
+ },
21
+ "bos_token": "<|startoftext|>",
22
+ "clean_up_tokenization_spaces": true,
23
+ "do_lower_case": true,
24
+ "eos_token": "<|endoftext|>",
25
+ "errors": "replace",
26
+ "extra_special_tokens": {},
27
+ "model_max_length": 77,
28
+ "pad_token": "<|endoftext|>",
29
+ "tokenizer_class": "CLIPTokenizer",
30
+ "unk_token": "<|endoftext|>"
31
+ }
tokenizer/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_2/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|startoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "!",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer_2/tokenizer_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "!",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "49406": {
13
+ "content": "<|startoftext|>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "49407": {
21
+ "content": "<|endoftext|>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ }
28
+ },
29
+ "bos_token": "<|startoftext|>",
30
+ "clean_up_tokenization_spaces": true,
31
+ "do_lower_case": true,
32
+ "eos_token": "<|endoftext|>",
33
+ "errors": "replace",
34
+ "extra_special_tokens": {},
35
+ "model_max_length": 77,
36
+ "pad_token": "!",
37
+ "tokenizer_class": "CLIPTokenizer",
38
+ "unk_token": "<|endoftext|>"
39
+ }
tokenizer_2/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
unet/config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "UNet2DConditionModel",
3
+ "_diffusers_version": "0.32.2",
4
+ "act_fn": "silu",
5
+ "addition_embed_type": "text_time",
6
+ "addition_embed_type_num_heads": 64,
7
+ "addition_time_embed_dim": 256,
8
+ "attention_head_dim": [
9
+ 5,
10
+ 10,
11
+ 20
12
+ ],
13
+ "attention_type": "default",
14
+ "block_out_channels": [
15
+ 320,
16
+ 640,
17
+ 1280
18
+ ],
19
+ "center_input_sample": false,
20
+ "class_embed_type": null,
21
+ "class_embeddings_concat": false,
22
+ "conv_in_kernel": 3,
23
+ "conv_out_kernel": 3,
24
+ "cross_attention_dim": 2048,
25
+ "cross_attention_norm": null,
26
+ "down_block_types": [
27
+ "DownBlock2D",
28
+ "CrossAttnDownBlock2D",
29
+ "CrossAttnDownBlock2D"
30
+ ],
31
+ "downsample_padding": 1,
32
+ "dropout": 0.0,
33
+ "dual_cross_attention": false,
34
+ "encoder_hid_dim": null,
35
+ "encoder_hid_dim_type": null,
36
+ "flip_sin_to_cos": true,
37
+ "freq_shift": 0,
38
+ "in_channels": 4,
39
+ "layers_per_block": 2,
40
+ "mid_block_only_cross_attention": null,
41
+ "mid_block_scale_factor": 1,
42
+ "mid_block_type": "UNetMidBlock2DCrossAttn",
43
+ "norm_eps": 1e-05,
44
+ "norm_num_groups": 32,
45
+ "num_attention_heads": null,
46
+ "num_class_embeds": null,
47
+ "only_cross_attention": false,
48
+ "out_channels": 4,
49
+ "projection_class_embeddings_input_dim": 2816,
50
+ "resnet_out_scale_factor": 1.0,
51
+ "resnet_skip_time_act": false,
52
+ "resnet_time_scale_shift": "default",
53
+ "reverse_transformer_layers_per_block": null,
54
+ "sample_size": 128,
55
+ "time_cond_proj_dim": null,
56
+ "time_embedding_act_fn": null,
57
+ "time_embedding_dim": null,
58
+ "time_embedding_type": "positional",
59
+ "timestep_post_act": null,
60
+ "transformer_layers_per_block": [
61
+ 1,
62
+ 2,
63
+ 10
64
+ ],
65
+ "up_block_types": [
66
+ "CrossAttnUpBlock2D",
67
+ "CrossAttnUpBlock2D",
68
+ "UpBlock2D"
69
+ ],
70
+ "upcast_attention": null,
71
+ "use_linear_projection": true
72
+ }
unet/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b89b0ac84d67da8db757909e2570e4a308b8896c8c893926401d8a998aea7c77
3
+ size 5135149760
vae/config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_class_name": "AutoencoderKL",
3
+ "_diffusers_version": "0.32.2",
4
+ "_name_or_path": "madebyollin/sdxl-vae-fp16-fix",
5
+ "act_fn": "silu",
6
+ "block_out_channels": [
7
+ 128,
8
+ 256,
9
+ 512,
10
+ 512
11
+ ],
12
+ "down_block_types": [
13
+ "DownEncoderBlock2D",
14
+ "DownEncoderBlock2D",
15
+ "DownEncoderBlock2D",
16
+ "DownEncoderBlock2D"
17
+ ],
18
+ "force_upcast": false,
19
+ "in_channels": 3,
20
+ "latent_channels": 4,
21
+ "latents_mean": null,
22
+ "latents_std": null,
23
+ "layers_per_block": 2,
24
+ "mid_block_add_attention": true,
25
+ "norm_num_groups": 32,
26
+ "out_channels": 3,
27
+ "sample_size": 512,
28
+ "scaling_factor": 0.13025,
29
+ "shift_factor": null,
30
+ "up_block_types": [
31
+ "UpDecoderBlock2D",
32
+ "UpDecoderBlock2D",
33
+ "UpDecoderBlock2D",
34
+ "UpDecoderBlock2D"
35
+ ],
36
+ "use_post_quant_conv": true,
37
+ "use_quant_conv": true
38
+ }
vae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6353737672c94b96174cb590f711eac6edf2fcce5b6e91aa9d73c5adc589ee48
3
+ size 167335342