--- pipeline_tag: text-to-image license: other license_name: faipl-1.0-sd license_link: LICENSE decoder: - Disty0/sotediffusion-wuerstchen3-alpha1-decoder --- # SoteDiffusion Wuerstchen3 Anime finetune of Würstchen V3. Currently is in early state in training. No commercial use. # Release Notes - Switched to OneTrainer - Trained more. - Currenty trained on 2,6M images.
# UI Guide ## SD.Next URL: https://github.com/vladmandic/automatic/ Switch to the dev branch: ``` git checkout dev ``` Go to Models -> Huggingface and type `Disty0/sotediffusion-wuerstchen3-alpha3-decoder` into the model name and press download. Load `Disty0/sotediffusion-wuerstchen3-alpha3-decoder` after the download process is complete. Prompt: ``` very aesthetic, best quality, newest, ``` Negative Prompt: ``` very displeasing, worst quality, oldest, monochrome, sketch, realistic, ``` Parameters: Sampler: Default Steps: 30 or 40 Refiner Steps: 10 CFG: 6-8 Secondary CFG: 2 or 1 Resolution: 1024x1536, 2048x1152 Anything works as long as it's a multiply of 128. ## ComfyUI Please refer to CivitAI: https://civitai.com/models/353284 # Code Example ```shell pip install diffusers ``` ```python import torch from diffusers import StableCascadeCombinedPipeline device = "cuda" dtype = torch.bfloat16 # or torch.float16 model = "Disty0/sotediffusion-wuerstchen3-alpha3-decoder" pipe = StableCascadeCombinedPipeline.from_pretrained(model, torch_dtype=dtype) # send everything to the gpu: pipe = pipe.to(device, dtype=dtype) pipe.prior_pipe = pipe.prior_pipe.to(device, dtype=dtype) # or enable model offload to save vram: # pipe.enable_model_cpu_offload() prompt = "1girl, solo, cowboy shot, straight hair, looking at viewer, hoodie, indoors, slight smile, casual, furniture, doorway, very aesthetic, best quality, newest," negative_prompt = "very displeasing, worst quality, oldest, monochrome, sketch, realistic," output = pipe( width=1024, height=1536, prompt=prompt, negative_prompt=negative_prompt, decoder_guidance_scale=1.0, prior_guidance_scale=8.0, prior_num_inference_steps=40, output_type="pil", num_inference_steps=10 ).images[0] ## do something with the output image ``` ## Training Status: **GPU used for training**: 1x AMD RX 7900 XTX 24GB **GPU Hours**: 500 (Accumulative starting from alpha1) | dataset name | training done | remaining | |---|---|---| | **newest** | 100 | 131 | | **recent** | 040 | 132 | | **mid** | 040 | 084 | | **early** | 040 | 030 | | **oldest** | done | done | | **pixiv** | done | done | | **visual novel cg** | done | done | | **anime wallpaper** | done | done | | **Total** | 333 | 375 | **Note**: chunks starts from 0 and there are 8000 images per chunk ## Dataset: **GPU used for captioning**: 1x Intel ARC A770 16GB **GPU Hours**: 350 **Model used for captioning**: SmilingWolf/wd-swinv2-tagger-v3 **Command:** ``` python /mnt/DataSSD/AI/Apps/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py --model_dir "/mnt/DataSSD/AI/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --use_rating_tags --character_tags_first --character_tag_expand --append_tags --onnx --caption_separator ", " --general_threshold 0.35 --character_threshold 0.50 --batch_size 4 --caption_extension ".txt" ./ ``` | dataset name | total images | total chunk | |---|---|---| | **newest** | 1.848.331 | 232 | | **recent** | 1.380.630 | 173 | | **mid** | 993.227 | 125 | | **early** | 566.152 | 071 | | **oldest** | 160.397 | 021 | | **pixiv** | 343.614 | 043 | | **visual novel cg** | 231.358 | 029 | | **anime wallpaper** | 104.790 | 014 | | **Total** | 5.628.499 | 708 | **Note**: - Smallest size is 1280x600 | 768.000 pixels - Deduped based on image similarity using czkawka-cli ## Tags: Model is trained with random tag order but this is the order in the dataset if you are interested: ``` aesthetic tags, quality tags, date tags, custom tags, rating tags, character, series, rest of the tags ``` ### Date: | tag | date | |---|---| | **newest** | 2022 to 2024 | | **recent** | 2019 to 2021 | | **mid** | 2015 to 2018 | | **early** | 2011 to 2014 | | **oldest** | 2005 to 2010 | ### Aesthetic Tags: **Model used**: shadowlilac/aesthetic-shadow-v2 | score greater than | tag | count | |---|---|---| | **0.90** | extremely aesthetic | 125.451 | | **0.80** | very aesthetic | 887.382 | | **0.70** | aesthetic | 1.049.857 | | **0.50** | slightly aesthetic | 1.643.091 | | **0.40** | not displeasing | 569.543 | | **0.30** | not aesthetic | 445.188 | | **0.20** | slightly displeasing | 341.424 | | **0.10** | displeasing | 237.660 | | **rest of them** | very displeasing | 328.712 | ### Quality Tags: **Model used**: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth | score greater than | tag | count | |---|---|---| | **0.980** | best quality | 1.270.447 | | **0.900** | high quality | 498.244 | | **0.750** | great quality | 351.006 | | **0.500** | medium quality | 366.448 | | **0.250** | normal quality | 368.380 | | **0.125** | bad quality | 279.050 | | **0.025** | low quality | 538.958 | | **rest of them** | worst quality | 1.955.966 | ## Rating Tags | tag | count | |---|---| | **general** | 1.416.451 | | **sensitive** | 3.447.664 | | **nsfw** | 427.459 | | **explicit nsfw** | 336.925 | ## Custom Tags: | dataset name | custom tag | |---|---| | **image boards** | date, | | **pixiv** | art by Display_Name, | | **visual novel cg** | Full_VN_Name (short_3_letter_name), visual novel cg, | | **anime wallpaper** | date, anime wallpaper, | ## Training Parameters: **Software used**: OneTrainer https://github.com/Nerogar/OneTrainer/ **Base model**: Disty0/sote-diffusion-cascade-alpha2 ### Config: ``` { "__version": 3, "training_method": "FINE_TUNE", "model_type": "STABLE_CASCADE_1", "debug_mode": false, "debug_dir": "debug", "workspace_dir": "/mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/OneTrainer/run", "cache_dir": "/mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/OneTrainer/workspace-cache/run", "tensorboard": true, "tensorboard_expose": false, "continue_last_backup": true, "include_train_config": "NONE", "base_model_name": "Disty0/sotediffusion-wuerstchen3-alpha2", "weight_dtype": "BFLOAT_16", "output_dtype": "BFLOAT_16", "output_model_format": "SAFETENSORS", "output_model_destination": "/mnt/DataSSD/AI/SoteDiffusion/Wuerstchen3/OneTrainer/workspace-cache/models", "gradient_checkpointing": true, "force_circular_padding": false, "concept_file_name": "training_concepts/concepts.json", "concepts": [ { "__version": 1, "image": { "__version": 0, "enable_crop_jitter": false, "enable_random_flip": false, "enable_fixed_flip": false, "enable_random_rotate": false, "enable_fixed_rotate": false, "random_rotate_max_angle": 0.0, "enable_random_brightness": false, "enable_fixed_brightness": false, "random_brightness_max_strength": 0.0, "enable_random_contrast": false, "enable_fixed_contrast": false, "random_contrast_max_strength": 0.0, "enable_random_saturation": false, "enable_fixed_saturation": false, "random_saturation_max_strength": 0.0, "enable_random_hue": false, "enable_fixed_hue": false, "random_hue_max_strength": 0.0, "enable_resolution_override": false, "resolution_override": "1024" }, "text": { "__version": 0, "prompt_source": "sample", "prompt_path": "", "enable_tag_shuffling": true, "tag_delimiter": ", ", "keep_tags_count": 1 }, "name": "", "path": "/mnt/DataSSD/AI/anime_image_dataset/best/newest_best", "seed": -209204630, "enabled": true, "include_subdirectories": true, "image_variations": 1, "text_variations": 1, "balancing": 1.0, "balancing_strategy": "REPEATS", "loss_weight": 1.0 } ], "circular_mask_generation": false, "random_rotate_and_crop": false, "aspect_ratio_bucketing": true, "latent_caching": true, "clear_cache_before_training": false, "learning_rate_scheduler": "CONSTANT", "learning_rate": 1e-05, "learning_rate_warmup_steps": 200, "learning_rate_cycles": 1, "epochs": 1, "batch_size": 16, "gradient_accumulation_steps": 1, "ema": "OFF", "ema_decay": 0.999, "ema_update_step_interval": 5, "dataloader_threads": 8, "train_device": "cuda", "temp_device": "cpu", "train_dtype": "FLOAT_16", "fallback_train_dtype": "BFLOAT_16", "enable_autocast_cache": true, "only_cache": false, "resolution": "1024", "attention_mechanism": "SDP", "align_prop": false, "align_prop_probability": 0.1, "align_prop_loss": "AESTHETIC", "align_prop_weight": 0.01, "align_prop_steps": 20, "align_prop_truncate_steps": 0.5, "align_prop_cfg_scale": 7.0, "mse_strength": 1.0, "mae_strength": 0.0, "vb_loss_strength": 1.0, "loss_weight_fn": "P2", "loss_weight_strength": 1.0, "dropout_probability": 0.0, "loss_scaler": "NONE", "learning_rate_scaler": "NONE", "offset_noise_weight": 0.0, "perturbation_noise_weight": 0.0, "rescale_noise_scheduler_to_zero_terminal_snr": false, "force_v_prediction": false, "force_epsilon_prediction": false, "min_noising_strength": 0.0, "max_noising_strength": 1.0, "noising_weight": 0.0, "noising_bias": 0.5, "unet": { "__version": 0, "model_name": "", "train": true, "stop_training_after": 0, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "NONE" }, "prior": { "__version": 0, "model_name": "", "train": true, "stop_training_after": 0, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "NONE" }, "text_encoder": { "__version": 0, "model_name": "", "train": true, "stop_training_after": 0, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "NONE" }, "text_encoder_layer_skip": 0, "text_encoder_2": { "__version": 0, "model_name": "", "train": true, "stop_training_after": 30, "stop_training_after_unit": "EPOCH", "learning_rate": null, "weight_dtype": "NONE" }, "text_encoder_2_layer_skip": 0, "vae": { "__version": 0, "model_name": "", "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_32" }, "effnet_encoder": { "__version": 0, "model_name": "/mnt/DataSSD/AI/models/wuerstchen3/effnet_encoder.safetensors", "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_16" }, "decoder": { "__version": 0, "model_name": "Disty0/sotediffusion-wuerstchen3-alpha2-decoder", "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_16" }, "decoder_text_encoder": { "__version": 0, "model_name": "", "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "NONE" }, "decoder_vqgan": { "__version": 0, "model_name": "", "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_16" }, "masked_training": false, "unmasked_probability": 0.1, "unmasked_weight": 0.1, "normalize_masked_area_loss": false, "embedding_learning_rate": null, "preserve_embedding_norm": false, "embedding": { "__version": 0, "uuid": "bf3a36b4-bd01-4b46-b818-3c6414887497", "model_name": "", "placeholder": "", "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "token_count": 1, "initial_embedding_text": "*" }, "additional_embeddings": [], "embedding_weight_dtype": "FLOAT_32", "lora_model_name": "", "lora_rank": 16, "lora_alpha": 1.0, "lora_weight_dtype": "FLOAT_32", "optimizer": { "__version": 0, "optimizer": "ADAFACTOR", "adam_w_mode": false, "alpha": null, "amsgrad": false, "beta1": null, "beta2": null, "beta3": null, "bias_correction": false, "block_wise": false, "capturable": false, "centered": false, "clip_threshold": 1.0, "d0": null, "d_coef": null, "dampening": null, "decay_rate": -0.8, "decouple": false, "differentiable": false, "eps": 1e-30, "eps2": 0.001, "foreach": false, "fsdp_in_use": false, "fused": false, "fused_back_pass": true, "growth_rate": null, "initial_accumulator_value": null, "is_paged": false, "log_every": null, "lr_decay": null, "max_unorm": null, "maximize": false, "min_8bit_size": null, "momentum": null, "nesterov": false, "no_prox": false, "optim_bits": null, "percentile_clipping": null, "r": null, "relative_step": false, "safeguard_warmup": false, "scale_parameter": false, "stochastic_rounding": true, "use_bias_correction": false, "use_triton": false, "warmup_init": false, "weight_decay": 0.0, "weight_lr_power": null }, "optimizer_defaults": { "ADAFACTOR": { "__version": 0, "optimizer": "ADAFACTOR", "adam_w_mode": false, "alpha": null, "amsgrad": false, "beta1": null, "beta2": null, "beta3": null, "bias_correction": false, "block_wise": false, "capturable": false, "centered": false, "clip_threshold": 1.0, "d0": null, "d_coef": null, "dampening": null, "decay_rate": -0.8, "decouple": false, "differentiable": false, "eps": 1e-30, "eps2": 0.001, "foreach": false, "fsdp_in_use": false, "fused": false, "fused_back_pass": true, "growth_rate": null, "initial_accumulator_value": null, "is_paged": false, "log_every": null, "lr_decay": null, "max_unorm": null, "maximize": false, "min_8bit_size": null, "momentum": null, "nesterov": false, "no_prox": false, "optim_bits": null, "percentile_clipping": null, "r": null, "relative_step": false, "safeguard_warmup": false, "scale_parameter": false, "stochastic_rounding": true, "use_bias_correction": false, "use_triton": false, "warmup_init": false, "weight_decay": 0.0, "weight_lr_power": null } }, "sample_definition_file_name": "training_samples/samples.json", "samples": [ { "__version": 0, "enabled": true, "prompt": "very aesthetic, best quality, newest, sensitive, 1girl, solo, upper body,", "negative_prompt": "monochrome, sketch, fat,", "height": 1024, "width": 1024, "seed": 42, "random_seed": true, "diffusion_steps": 30, "cfg_scale": 7.0, "noise_scheduler": "EULER_A" } ], "sample_after": 30, "sample_after_unit": "MINUTE", "sample_image_format": "JPG", "samples_to_tensorboard": true, "non_ema_sampling": true, "backup_after": 30, "backup_after_unit": "MINUTE", "rolling_backup": true, "rolling_backup_count": 10, "backup_before_save": true, "save_after": 30, "save_after_unit": "MINUTE", "save_filename_prefix": "" } ``` ## Limitations and Bias ### Bias - This model is intended for anime illustrations. Realistic capabilites are not tested at all. ### Limitations - Can fall back to realistic. Add "realistic" tag to the negatives when this happens. - Far shot eyes can be bad. - Anatomy and hands can be bad. - Still in active training. ## License SoteDiffusion models falls under [Fair AI Public License 1.0-SD](https://freedevproject.org/faipl-1.0-sd/) license, which is compatible with Stable Diffusion models’ license. Key points: 1. **Modification Sharing:** If you modify SoteDiffusion models, you must share both your changes and the original license. 2. **Source Code Accessibility:** If your modified version is network-accessible, provide a way (like a download link) for others to get the source code. This applies to derived models too. 3. **Distribution Terms:** Any distribution must be under this license or another with similar rules. 4. **Compliance:** Non-compliance must be fixed within 30 days to avoid license termination, emphasizing transparency and adherence to open-source values. **Notes**: Anything not covered by Fair AI license is inherited from Stability AI Non-Commercial license which is named as LICENSE_INHERIT. Meaning, still no commercial use of any kind.