Update README.md
Browse files
README.md
CHANGED
@@ -11,13 +11,13 @@ tags:
|
|
11 |
- stable-diffusion-xl
|
12 |
base_model: cagliostrolab/animagine-xl-3.0
|
13 |
widget:
|
14 |
-
- text: 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality
|
15 |
parameter:
|
16 |
-
negative_prompt: nsfw, lowres, bad
|
17 |
example_title: 1girl
|
18 |
-
- text: 1boy, male focus, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality
|
19 |
parameter:
|
20 |
-
negative_prompt: nsfw, lowres, bad
|
21 |
example_title: 1boy
|
22 |
---
|
23 |
<style>
|
@@ -197,7 +197,6 @@ In addition to special tags, we would like to introduce aesthetic tags based on
|
|
197 |
## Anime-focused Dataset Additions
|
198 |
On Animagine XL 3.0, we mostly added characters from popular gacha games. Based on users’ feedbacks, we are adding plenty of popular anime franchises into our dataset for this model. We will release the full list of the characters that might be generated by this iteration to our HuggingFace soon, be sure to check it out when it’s up!
|
199 |
|
200 |
-
|
201 |
## Model Details
|
202 |
- **Developed by**: [Cagliostro Research Lab](https://huggingface.co/cagliostrolab)
|
203 |
- **Model type**: Diffusion-based text-to-image generative model
|
@@ -217,36 +216,24 @@ Animagine XL 3.1 is accessible through user-friendly platforms such as Gradio an
|
|
217 |
To use Animagine XL 3.1, install the required libraries as follows:
|
218 |
|
219 |
```bash
|
220 |
-
pip install diffusers --upgrade
|
221 |
-
pip install transformers accelerate safetensors
|
222 |
```
|
223 |
|
224 |
Example script for generating images with Animagine XL 3.1:
|
225 |
|
226 |
```python
|
227 |
import torch
|
228 |
-
from diffusers import
|
229 |
-
|
230 |
-
|
231 |
-
AutoencoderKL
|
232 |
-
)
|
233 |
-
# Load VAE component
|
234 |
-
vae = AutoencoderKL.from_pretrained(
|
235 |
-
"madebyollin/sdxl-vae-fp16-fix",
|
236 |
-
torch_dtype=torch.float16
|
237 |
-
)
|
238 |
-
# Configure the pipeline
|
239 |
-
pipe = StableDiffusionXLPipeline.from_pretrained(
|
240 |
"cagliostrolab/animagine-xl-3.1",
|
241 |
-
vae=vae,
|
242 |
torch_dtype=torch.float16,
|
243 |
use_safetensors=True,
|
244 |
)
|
245 |
-
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
|
246 |
pipe.to('cuda')
|
247 |
-
|
248 |
-
prompt = "1girl,
|
249 |
-
negative_prompt = "nsfw, lowres, bad
|
250 |
image = pipe(
|
251 |
prompt,
|
252 |
negative_prompt=negative_prompt,
|
@@ -255,13 +242,15 @@ image = pipe(
|
|
255 |
guidance_scale=7,
|
256 |
num_inference_steps=28
|
257 |
).images[0]
|
|
|
|
|
258 |
```
|
259 |
|
260 |
## Usage Guidelines
|
261 |
|
262 |
### Tag Ordering
|
263 |
|
264 |
-
|
265 |
|
266 |
```
|
267 |
1girl/1boy, character name, from what series, everything else in any order.
|
@@ -273,63 +262,67 @@ Like the previous iteration, this model was trained with some special tags to st
|
|
273 |
|
274 |
### Quality Modifiers
|
275 |
|
276 |
-
|
277 |
-
|
278 |
-
|
|
279 |
-
|
280 |
-
| `
|
281 |
-
| `
|
282 |
-
| `
|
283 |
-
| `
|
284 |
-
| `
|
|
|
|
|
285 |
|
286 |
### Rating Modifiers
|
287 |
|
288 |
-
|
289 |
-
|
290 |
-
|
|
291 |
-
|
292 |
-
| `
|
293 |
-
| `
|
|
|
|
|
294 |
|
295 |
### Year Modifier
|
296 |
|
297 |
-
|
298 |
|
299 |
| Year Tag | Year Range |
|
300 |
-
|
301 |
-
| `newest` |
|
302 |
-
| `
|
303 |
-
| `mid` | 2015 to
|
304 |
| `early` | 2011 to 2014 |
|
305 |
| `oldest` | 2005 to 2010 |
|
306 |
|
307 |
### Aesthetic Tags
|
308 |
|
309 |
-
|
310 |
|
311 |
-
| Aesthetic
|
312 |
-
|
313 |
-
| `very aesthetic`
|
314 |
-
| `aesthetic`
|
315 |
-
| `displeasing`
|
316 |
-
| `very displeasing
|
317 |
|
318 |
## Recommended settings
|
319 |
|
320 |
To guide the model towards generating high-aesthetic images, use negative prompts like:
|
321 |
|
322 |
```
|
323 |
-
nsfw, lowres, bad
|
324 |
```
|
325 |
|
326 |
For higher quality outcomes, prepend prompts with:
|
327 |
|
328 |
```
|
329 |
-
masterpiece, best quality
|
330 |
```
|
331 |
|
332 |
-
|
333 |
|
334 |
### Multi Aspect Resolution
|
335 |
|
@@ -349,37 +342,38 @@ This model supports generating images at the following dimensions:
|
|
349 |
|
350 |
## Training and Hyperparameters
|
351 |
|
352 |
-
- **Animagine XL 3.1** was trained on a 2x A100 GPU
|
353 |
-
-
|
354 |
-
- **
|
355 |
-
|
356 |
-
|
357 |
-
- **
|
358 |
|
359 |
### Hyperparameters
|
360 |
|
361 |
-
| Stage
|
362 |
-
|
363 |
-
| **
|
364 |
-
| **
|
365 |
-
| **
|
366 |
|
367 |
-
## Model Comparison
|
368 |
|
369 |
### Training Config
|
370 |
|
371 |
-
| Configuration Item
|
372 |
-
|
373 |
-
| **GPU**
|
374 |
-
| **Dataset**
|
375 |
-
| **Shuffle Separator**
|
376 |
-
| **
|
377 |
-
| **Learning Rate**
|
378 |
-
| **
|
379 |
-
| **
|
380 |
-
| **
|
381 |
-
| **
|
382 |
-
| **
|
|
|
383 |
|
384 |
Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook
|
385 |
|
|
|
11 |
- stable-diffusion-xl
|
12 |
base_model: cagliostrolab/animagine-xl-3.0
|
13 |
widget:
|
14 |
+
- text: 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality, very aesthetic, absurdes
|
15 |
parameter:
|
16 |
+
negative_prompt: nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]
|
17 |
example_title: 1girl
|
18 |
+
- text: 1boy, male focus, green hair, sweater, looking at viewer, upper body, beanie, outdoors, night, turtleneck, masterpiece, best quality, very aesthetic, absurdes
|
19 |
parameter:
|
20 |
+
negative_prompt: nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]
|
21 |
example_title: 1boy
|
22 |
---
|
23 |
<style>
|
|
|
197 |
## Anime-focused Dataset Additions
|
198 |
On Animagine XL 3.0, we mostly added characters from popular gacha games. Based on users’ feedbacks, we are adding plenty of popular anime franchises into our dataset for this model. We will release the full list of the characters that might be generated by this iteration to our HuggingFace soon, be sure to check it out when it’s up!
|
199 |
|
|
|
200 |
## Model Details
|
201 |
- **Developed by**: [Cagliostro Research Lab](https://huggingface.co/cagliostrolab)
|
202 |
- **Model type**: Diffusion-based text-to-image generative model
|
|
|
216 |
To use Animagine XL 3.1, install the required libraries as follows:
|
217 |
|
218 |
```bash
|
219 |
+
pip install diffusers transformers accelerate safetensors --upgrade
|
|
|
220 |
```
|
221 |
|
222 |
Example script for generating images with Animagine XL 3.1:
|
223 |
|
224 |
```python
|
225 |
import torch
|
226 |
+
from diffusers import DiffusionPipeline,
|
227 |
+
|
228 |
+
pipe = DiffusionPipeline.from_pretrained(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
229 |
"cagliostrolab/animagine-xl-3.1",
|
|
|
230 |
torch_dtype=torch.float16,
|
231 |
use_safetensors=True,
|
232 |
)
|
|
|
233 |
pipe.to('cuda')
|
234 |
+
|
235 |
+
prompt = "1girl, souryuu asuka langley, neon genesis evangelion, solo, upper body, v, smile, looking at viewer, outdoors, night"
|
236 |
+
negative_prompt = "nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]"
|
237 |
image = pipe(
|
238 |
prompt,
|
239 |
negative_prompt=negative_prompt,
|
|
|
242 |
guidance_scale=7,
|
243 |
num_inference_steps=28
|
244 |
).images[0]
|
245 |
+
|
246 |
+
image.save("./asuka_test.png")
|
247 |
```
|
248 |
|
249 |
## Usage Guidelines
|
250 |
|
251 |
### Tag Ordering
|
252 |
|
253 |
+
For optimal results, it's recommended to follow the structured prompt template because we train the model like this:
|
254 |
|
255 |
```
|
256 |
1girl/1boy, character name, from what series, everything else in any order.
|
|
|
262 |
|
263 |
### Quality Modifiers
|
264 |
|
265 |
+
Quality tags now consider both scores and post ratings to ensure a balanced quality distribution. We've refined labels for greater clarity, such as changing 'high quality' to 'great quality'.
|
266 |
+
|
267 |
+
| Quality Modifier | Score Criterion |
|
268 |
+
|------------------|-------------------|
|
269 |
+
| `masterpiece` | > 95% |
|
270 |
+
| `best quality` | > 85% & ≤ 95% |
|
271 |
+
| `great quality` | > 75% & ≤ 85% |
|
272 |
+
| `good quality` | > 50% & ≤ 75% |
|
273 |
+
| `normal quality` | > 25% & ≤ 50% |
|
274 |
+
| `low quality` | > 10% & ≤ 25% |
|
275 |
+
| `worst quality` | ≤ 10% |
|
276 |
|
277 |
### Rating Modifiers
|
278 |
|
279 |
+
We've also streamlined our rating tags for simplicity and clarity, aiming to establish global rules that can be applied across different models. For example, the tag 'rating: general' is now simply 'general', and 'rating: sensitive' has been condensed to 'sensitive'.
|
280 |
+
|
281 |
+
| Rating Modifier | Rating Criterion |
|
282 |
+
|-------------------|------------------|
|
283 |
+
| `general` | General |
|
284 |
+
| `sensitive` | Sensitive |
|
285 |
+
| `nsfw` | Questionable |
|
286 |
+
| `explicit, nsfw` | Explicit |
|
287 |
|
288 |
### Year Modifier
|
289 |
|
290 |
+
We've also redefined the year range to steer results towards specific modern or vintage anime art styles more accurately. This update simplifies the range, focusing on relevance to current and past eras.
|
291 |
|
292 |
| Year Tag | Year Range |
|
293 |
+
|----------|------------------|
|
294 |
+
| `newest` | 2021 to 2024 |
|
295 |
+
| `recent` | 2018 to 2020 |
|
296 |
+
| `mid` | 2015 to 2017 |
|
297 |
| `early` | 2011 to 2014 |
|
298 |
| `oldest` | 2005 to 2010 |
|
299 |
|
300 |
### Aesthetic Tags
|
301 |
|
302 |
+
We've enhanced our tagging system with aesthetic tags to refine content categorization based on visual appeal. These tags—`very aesthetic`, `aesthetic`, `displeasing`, and `very displeasing`—are derived from evaluations made by a specialized ViT (Vision Transformer) image classification model, specifically trained on anime data. For this purpose, we utilized the model [shadowlilac/aesthetic-shadow-v2](https://huggingface.co/shadowlilac/aesthetic-shadow-v2), which assesses the aesthetic value of content before it undergoes training. This ensures that each piece of content is not only relevant and accurate but also visually appealing.
|
303 |
|
304 |
+
| Aesthetic Tag | Score Range |
|
305 |
+
|-------------------|-------------------|
|
306 |
+
| `very aesthetic` | > 0.71 |
|
307 |
+
| `aesthetic` | > 0.45 & < 0.71 |
|
308 |
+
| `displeasing` | > 0.27 & < 0.45 |
|
309 |
+
| `very displeasing`| ≤ 0.27 |
|
310 |
|
311 |
## Recommended settings
|
312 |
|
313 |
To guide the model towards generating high-aesthetic images, use negative prompts like:
|
314 |
|
315 |
```
|
316 |
+
nsfw, lowres, (bad), text, error, fewer, extra, missing, worst quality, jpeg artifacts, low quality, watermark, unfinished, displeasing, oldest, early, chromatic aberration, signature, extra digits, artistic error, username, scan, [abstract]
|
317 |
```
|
318 |
|
319 |
For higher quality outcomes, prepend prompts with:
|
320 |
|
321 |
```
|
322 |
+
masterpiece, best quality, very aesthetic, absurdres
|
323 |
```
|
324 |
|
325 |
+
it’s recommended to use a lower classifier-free guidance (CFG Scale) of around 5-7, sampling steps below 30, and to use Euler Ancestral (Euler a) as a sampler.
|
326 |
|
327 |
### Multi Aspect Resolution
|
328 |
|
|
|
342 |
|
343 |
## Training and Hyperparameters
|
344 |
|
345 |
+
- **Animagine XL 3.1** was trained on a 2x A100 GPU 80GB for roughly 15 days or over 350 gpu hours (pretraining stage). The training process encompassed three stages:
|
346 |
+
- Continual Pretraining:
|
347 |
+
- **Pretraining Stage**: Utilize data-rich collection of images, this consists of 870k million ordered, tagged images, to increase Animagine XL 3.0 model knowledge.
|
348 |
+
- Finetuning:
|
349 |
+
- **First Stage**: Utilize labeled and curated aesthetic datasets to refine broken U-Net after pretraining
|
350 |
+
- **Second Stage**: Utilize labeled and curated aesthetic datasets to refine the model's art style and fixing bad hands and anatomy
|
351 |
|
352 |
### Hyperparameters
|
353 |
|
354 |
+
| Stage | Epochs | UNet lr | Train Text Encoder | Batch Size | Noise Offset | Optimizer | LR Scheduler | Grad Acc Steps | GPUs |
|
355 |
+
|-----------------------|--------|---------|--------------------|------------|--------------|------------|-------------------------------|----------------|------|
|
356 |
+
| **Pretraining Stage** | 10 | 1e-5 | True | 16 | N/A | AdamW | Cosine Annealing Warm Restart | 3 | 2 |
|
357 |
+
| **First Stage** | 10 | 2e-6 | False | 48 | 0.0357 | Adafactor | Constant with Warmup | 1 | 1 |
|
358 |
+
| **Second Stage** | 15 | 1e-6 | False | 48 | 0.0357 | Adafactor | Constant with Warmup | 1 | 1 |
|
359 |
|
360 |
+
## Model Comparison (Pretraining only)
|
361 |
|
362 |
### Training Config
|
363 |
|
364 |
+
| Configuration Item | Animagine XL 3.0 | Animagine XL 3.1 |
|
365 |
+
|---------------------------------|------------------------------------------|------------------------------------------------|
|
366 |
+
| **GPU** | 2 x A100 80G | 2 x A100 80G |
|
367 |
+
| **Dataset** | 1,271,990 | 873,504 |
|
368 |
+
| **Shuffle Separator** | True | True |
|
369 |
+
| **Num Epochs** | 10 | 10 |
|
370 |
+
| **Learning Rate** | 7.5e-6 | 1e-5 |
|
371 |
+
| **Text Encoder Learning Rate** | 3.75e-6 | 1e-5 |
|
372 |
+
| **Effective Batch Size** | 48 x 1 x 2 | 16 x 3 x 2 |
|
373 |
+
| **Optimizer** | Adafactor | AdamW |
|
374 |
+
| **Optimizer Args** | Scale Parameter: False, Relative Step: False, Warmup Init: False | Weight Decay: 0.1, Betas: (0.9, 0.99) |
|
375 |
+
| **LR Scheduler** | Constant with Warmup | Cosine Annealing Warm Restart |
|
376 |
+
| **LR Scheduler Args** | Warmup Steps: 100 | Num Cycles: 10, Min LR: 1e-6, LR Decay: 0.9, First Cycle Steps: 9,099 |
|
377 |
|
378 |
Source code and training config are available here: https://github.com/cagliostrolab/sd-scripts/tree/main/notebook
|
379 |
|