pipeline_tag: text-to-image
license: other
license_name: faipl-1.0-sd
license_link: LICENSE
decoder:
- Disty0/sotediffusion-wuerstchen3-alpha1-decoder
SoteDiffusion Wuerstchen3
Anime finetune of Würstchen V3.
Currently is in early state in training.
No commercial use.
Release Notes
- Switched to OneTrainer
- Trained more.
- Currenty trained on 2,6M images.
UI Guide
SD.Next
URL: https://github.com/vladmandic/automatic/
Switch to the dev branch:
git checkout dev
Go to Models -> Huggingface and type Disty0/sotediffusion-wuerstchen3-alpha3-decoder
into the model name and press download.
Load Disty0/sotediffusion-wuerstchen3-alpha3-decoder
after the download process is complete.
Prompt:
very aesthetic, best quality, newest,
Negative Prompt:
very displeasing, worst quality, oldest, monochrome, sketch, realistic,
Parameters:
Sampler: Default
Steps: 30 or 40
Refiner Steps: 10
CFG: 6-8
Secondary CFG: 2 or 1
Resolution: 1024x1536, 2048x1152
Anything works as long as it's a multiply of 128.
ComfyUI
Please refer to CivitAI: https://civitai.com/models/353284
Code Example
pip install diffusers
import torch
from diffusers import StableCascadeCombinedPipeline
device = "cuda"
dtype = torch.bfloat16 # or torch.float16
model = "Disty0/sotediffusion-wuerstchen3-alpha3-decoder"
pipe = StableCascadeCombinedPipeline.from_pretrained(model, torch_dtype=dtype)
# send everything to the gpu:
pipe = pipe.to(device, dtype=dtype)
pipe.prior_pipe = pipe.prior_pipe.to(device, dtype=dtype)
# or enable model offload to save vram:
# pipe.enable_model_cpu_offload()
prompt = "1girl, solo, cowboy shot, straight hair, looking at viewer, hoodie, indoors, slight smile, casual, furniture, doorway, very aesthetic, best quality, newest,"
negative_prompt = "very displeasing, worst quality, oldest, monochrome, sketch, realistic,"
output = pipe(
width=1024,
height=1536,
prompt=prompt,
negative_prompt=negative_prompt,
decoder_guidance_scale=1.0,
prior_guidance_scale=8.0,
prior_num_inference_steps=40,
output_type="pil",
num_inference_steps=10
).images[0]
## do something with the output image
Training Status:
GPU used for training: 1x AMD RX 7900 XTX 24GB
GPU Hours: 500 (Accumulative starting from alpha1)
dataset name | training done | remaining |
---|---|---|
newest | 100 | 131 |
recent | 040 | 132 |
mid | 040 | 084 |
early | 040 | 030 |
oldest | done | done |
pixiv | done | done |
visual novel cg | done | done |
anime wallpaper | done | done |
Total | 333 | 375 |
Note: chunks starts from 0 and there are 8000 images per chunk
Dataset:
GPU used for captioning: 1x Intel ARC A770 16GB
GPU Hours: 350
Model used for captioning: SmilingWolf/wd-swinv2-tagger-v3
Command:
python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./
dataset name | total images | total chunk |
---|---|---|
newest | 1.848.331 | 232 |
recent | 1.380.630 | 173 |
mid | 993.227 | 125 |
early | 566.152 | 071 |
oldest | 160.397 | 021 |
pixiv | 343.614 | 043 |
visual novel cg | 231.358 | 029 |
anime wallpaper | 104.790 | 014 |
Total | 5.628.499 | 708 |
Note:
- Smallest size is 1280x600 | 768.000 pixels
- Deduped based on image similarity using czkawka-cli
Tags:
Model is trained with random tag order but this is the order in the dataset if you are interested:
aesthetic tags, quality tags, date tags, custom tags, rating tags, character, series, rest of the tags
Date:
tag | date |
---|---|
newest | 2022 to 2024 |
recent | 2019 to 2021 |
mid | 2015 to 2018 |
early | 2011 to 2014 |
oldest | 2005 to 2010 |
Aesthetic Tags:
Model used: shadowlilac/aesthetic-shadow-v2
score greater than | tag | count |
---|---|---|
0.90 | extremely aesthetic | 125.451 |
0.80 | very aesthetic | 887.382 |
0.70 | aesthetic | 1.049.857 |
0.50 | slightly aesthetic | 1.643.091 |
0.40 | not displeasing | 569.543 |
0.30 | not aesthetic | 445.188 |
0.20 | slightly displeasing | 341.424 |
0.10 | displeasing | 237.660 |
rest of them | very displeasing | 328.712 |
Quality Tags:
Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth
score greater than | tag | count |
---|---|---|
0.980 | best quality | 1.270.447 |
0.900 | high quality | 498.244 |
0.750 | great quality | 351.006 |
0.500 | medium quality | 366.448 |
0.250 | normal quality | 368.380 |
0.125 | bad quality | 279.050 |
0.025 | low quality | 538.958 |
rest of them | worst quality | 1.955.966 |
Rating Tags
tag | count |
---|---|
general | 1.416.451 |
sensitive | 3.447.664 |
nsfw | 427.459 |
explicit nsfw | 336.925 |
Custom Tags:
dataset name | custom tag |
---|---|
image boards | date, |
pixiv | art by Display_Name, |
visual novel cg | Full_VN_Name (short_3_letter_name), visual novel cg, |
anime wallpaper | date, anime wallpaper, |
Training Parameters:
Software used: OneTrainer
https://github.com/Nerogar/OneTrainer/
Base model: Disty0/sote-diffusion-cascade-alpha2
Config:
{
"__version": 3,
"training_method": "FINE_TUNE",
"model_type": "STABLE_CASCADE_1",
"debug_mode": false,
"debug_dir": "debug",
"workspace_dir": "/mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/OneTrainer/run",
"cache_dir": "/mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/OneTrainer/workspace-cache/run",
"tensorboard": true,
"tensorboard_expose": false,
"continue_last_backup": true,
"include_train_config": "NONE",
"base_model_name": "Disty0/sotediffusion-wuerstchen3-alpha2",
"weight_dtype": "BFLOAT_16",
"output_dtype": "BFLOAT_16",
"output_model_format": "SAFETENSORS",
"output_model_destination": "/mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/OneTrainer/workspace-cache/models",
"gradient_checkpointing": true,
"force_circular_padding": false,
"concept_file_name": "training_concepts/concepts.json",
"concepts": [
{
"__version": 1,
"image": {
"__version": 0,
"enable_crop_jitter": false,
"enable_random_flip": false,
"enable_fixed_flip": false,
"enable_random_rotate": false,
"enable_fixed_rotate": false,
"random_rotate_max_angle": 0.0,
"enable_random_brightness": false,
"enable_fixed_brightness": false,
"random_brightness_max_strength": 0.0,
"enable_random_contrast": false,
"enable_fixed_contrast": false,
"random_contrast_max_strength": 0.0,
"enable_random_saturation": false,
"enable_fixed_saturation": false,
"random_saturation_max_strength": 0.0,
"enable_random_hue": false,
"enable_fixed_hue": false,
"random_hue_max_strength": 0.0,
"enable_resolution_override": false,
"resolution_override": "1024"
},
"text": {
"__version": 0,
"prompt_source": "sample",
"prompt_path": "",
"enable_tag_shuffling": true,
"tag_delimiter": ", ",
"keep_tags_count": 1
},
"name": "",
"path": "/mnt/DataSSD/AI/anime_image_dataset/best/newest_best",
"seed": -209204630,
"enabled": true,
"include_subdirectories": true,
"image_variations": 1,
"text_variations": 1,
"balancing": 1.0,
"balancing_strategy": "REPEATS",
"loss_weight": 1.0
}
],
"circular_mask_generation": false,
"random_rotate_and_crop": false,
"aspect_ratio_bucketing": true,
"latent_caching": true,
"clear_cache_before_training": false,
"learning_rate_scheduler": "CONSTANT",
"learning_rate": 1e-05,
"learning_rate_warmup_steps": 200,
"learning_rate_cycles": 1,
"epochs": 1,
"batch_size": 16,
"gradient_accumulation_steps": 1,
"ema": "OFF",
"ema_decay": 0.999,
"ema_update_step_interval": 5,
"dataloader_threads": 8,
"train_device": "cuda",
"temp_device": "cpu",
"train_dtype": "FLOAT_16",
"fallback_train_dtype": "BFLOAT_16",
"enable_autocast_cache": true,
"only_cache": false,
"resolution": "1024",
"attention_mechanism": "SDP",
"align_prop": false,
"align_prop_probability": 0.1,
"align_prop_loss": "AESTHETIC",
"align_prop_weight": 0.01,
"align_prop_steps": 20,
"align_prop_truncate_steps": 0.5,
"align_prop_cfg_scale": 7.0,
"mse_strength": 1.0,
"mae_strength": 0.0,
"vb_loss_strength": 1.0,
"loss_weight_fn": "P2",
"loss_weight_strength": 1.0,
"dropout_probability": 0.0,
"loss_scaler": "NONE",
"learning_rate_scaler": "NONE",
"offset_noise_weight": 0.0,
"perturbation_noise_weight": 0.0,
"rescale_noise_scheduler_to_zero_terminal_snr": false,
"force_v_prediction": false,
"force_epsilon_prediction": false,
"min_noising_strength": 0.0,
"max_noising_strength": 1.0,
"noising_weight": 0.0,
"noising_bias": 0.5,
"unet": {
"__version": 0,
"model_name": "",
"train": true,
"stop_training_after": 0,
"stop_training_after_unit": "NEVER",
"learning_rate": null,
"weight_dtype": "NONE"
},
"prior": {
"__version": 0,
"model_name": "",
"train": true,
"stop_training_after": 0,
"stop_training_after_unit": "NEVER",
"learning_rate": null,
"weight_dtype": "NONE"
},
"text_encoder": {
"__version": 0,
"model_name": "",
"train": true,
"stop_training_after": 0,
"stop_training_after_unit": "NEVER",
"learning_rate": null,
"weight_dtype": "NONE"
},
"text_encoder_layer_skip": 0,
"text_encoder_2": {
"__version": 0,
"model_name": "",
"train": true,
"stop_training_after": 30,
"stop_training_after_unit": "EPOCH",
"learning_rate": null,
"weight_dtype": "NONE"
},
"text_encoder_2_layer_skip": 0,
"vae": {
"__version": 0,
"model_name": "",
"train": true,
"stop_training_after": null,
"stop_training_after_unit": "NEVER",
"learning_rate": null,
"weight_dtype": "FLOAT_32"
},
"effnet_encoder": {
"__version": 0,
"model_name": "/mnt/DataSSD/AI/models/wuerstchen3/effnet_encoder.safetensors",
"train": true,
"stop_training_after": null,
"stop_training_after_unit": "NEVER",
"learning_rate": null,
"weight_dtype": "FLOAT_16"
},
"decoder": {
"__version": 0,
"model_name": "Disty0/sotediffusion-wuerstchen3-alpha2-decoder",
"train": true,
"stop_training_after": null,
"stop_training_after_unit": "NEVER",
"learning_rate": null,
"weight_dtype": "FLOAT_16"
},
"decoder_text_encoder": {
"__version": 0,
"model_name": "",
"train": true,
"stop_training_after": null,
"stop_training_after_unit": "NEVER",
"learning_rate": null,
"weight_dtype": "NONE"
},
"decoder_vqgan": {
"__version": 0,
"model_name": "",
"train": true,
"stop_training_after": null,
"stop_training_after_unit": "NEVER",
"learning_rate": null,
"weight_dtype": "FLOAT_16"
},
"masked_training": false,
"unmasked_probability": 0.1,
"unmasked_weight": 0.1,
"normalize_masked_area_loss": false,
"embedding_learning_rate": null,
"preserve_embedding_norm": false,
"embedding": {
"__version": 0,
"uuid": "bf3a36b4-bd01-4b46-b818-3c6414887497",
"model_name": "",
"placeholder": "<embedding>",
"train": true,
"stop_training_after": null,
"stop_training_after_unit": "NEVER",
"token_count": 1,
"initial_embedding_text": "*"
},
"additional_embeddings": [],
"embedding_weight_dtype": "FLOAT_32",
"lora_model_name": "",
"lora_rank": 16,
"lora_alpha": 1.0,
"lora_weight_dtype": "FLOAT_32",
"optimizer": {
"__version": 0,
"optimizer": "ADAFACTOR",
"adam_w_mode": false,
"alpha": null,
"amsgrad": false,
"beta1": null,
"beta2": null,
"beta3": null,
"bias_correction": false,
"block_wise": false,
"capturable": false,
"centered": false,
"clip_threshold": 1.0,
"d0": null,
"d_coef": null,
"dampening": null,
"decay_rate": -0.8,
"decouple": false,
"differentiable": false,
"eps": 1e-30,
"eps2": 0.001,
"foreach": false,
"fsdp_in_use": false,
"fused": false,
"fused_back_pass": true,
"growth_rate": null,
"initial_accumulator_value": null,
"is_paged": false,
"log_every": null,
"lr_decay": null,
"max_unorm": null,
"maximize": false,
"min_8bit_size": null,
"momentum": null,
"nesterov": false,
"no_prox": false,
"optim_bits": null,
"percentile_clipping": null,
"r": null,
"relative_step": false,
"safeguard_warmup": false,
"scale_parameter": false,
"stochastic_rounding": true,
"use_bias_correction": false,
"use_triton": false,
"warmup_init": false,
"weight_decay": 0.0,
"weight_lr_power": null
},
"optimizer_defaults": {
"ADAFACTOR": {
"__version": 0,
"optimizer": "ADAFACTOR",
"adam_w_mode": false,
"alpha": null,
"amsgrad": false,
"beta1": null,
"beta2": null,
"beta3": null,
"bias_correction": false,
"block_wise": false,
"capturable": false,
"centered": false,
"clip_threshold": 1.0,
"d0": null,
"d_coef": null,
"dampening": null,
"decay_rate": -0.8,
"decouple": false,
"differentiable": false,
"eps": 1e-30,
"eps2": 0.001,
"foreach": false,
"fsdp_in_use": false,
"fused": false,
"fused_back_pass": true,
"growth_rate": null,
"initial_accumulator_value": null,
"is_paged": false,
"log_every": null,
"lr_decay": null,
"max_unorm": null,
"maximize": false,
"min_8bit_size": null,
"momentum": null,
"nesterov": false,
"no_prox": false,
"optim_bits": null,
"percentile_clipping": null,
"r": null,
"relative_step": false,
"safeguard_warmup": false,
"scale_parameter": false,
"stochastic_rounding": true,
"use_bias_correction": false,
"use_triton": false,
"warmup_init": false,
"weight_decay": 0.0,
"weight_lr_power": null
}
},
"sample_definition_file_name": "training_samples/samples.json",
"samples": [
{
"__version": 0,
"enabled": true,
"prompt": "very aesthetic, best quality, newest, sensitive, 1girl, solo, upper body,",
"negative_prompt": "monochrome, sketch, fat,",
"height": 1024,
"width": 1024,
"seed": 42,
"random_seed": true,
"diffusion_steps": 30,
"cfg_scale": 7.0,
"noise_scheduler": "EULER_A"
}
],
"sample_after": 30,
"sample_after_unit": "MINUTE",
"sample_image_format": "JPG",
"samples_to_tensorboard": true,
"non_ema_sampling": true,
"backup_after": 30,
"backup_after_unit": "MINUTE",
"rolling_backup": true,
"rolling_backup_count": 10,
"backup_before_save": true,
"save_after": 30,
"save_after_unit": "MINUTE",
"save_filename_prefix": ""
}
Limitations and Bias
Bias
- This model is intended for anime illustrations.
Realistic capabilites are not tested at all.
Limitations
- Can fall back to realistic.
Add "realistic" tag to the negatives when this happens. - Far shot eyes can be bad.
- Anatomy and hands can be bad.
- Still in active training.
License
SoteDiffusion models falls under Fair AI Public License 1.0-SD license, which is compatible with Stable Diffusion models’ license. Key points:
- Modification Sharing: If you modify SoteDiffusion models, you must share both your changes and the original license.
- Source Code Accessibility: If your modified version is network-accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too.
- Distribution Terms: Any distribution must be under this license or another with similar rules.
- Compliance: Non-compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open-source values.
Notes: Anything not covered by Fair AI license is inherited from Stability AI Non-Commercial license which is named as LICENSE_INHERIT. Meaning, still no commercial use of any kind.