SoteDiffusion Cascade

Anime finetune of Stable Cascade.
Currently is in very early state in training.
No commercial use thanks to StabilityAI.

Code Example

pip install diffusers
import torch
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline

prompt = "(extremely aesthetic, best quality, newest), 1girl, solo, cat ears, looking at viewer, blush, light smile, upper body,"
negative_prompt = "very displeasing, worst quality, monochrome, sketch, blurry, fat, child,"

prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_pre-alpha0", torch_dtype=torch.float16)
decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_pre-alpha0", torch_dtype=torch.float16)

prior.enable_model_cpu_offload()
prior_output = prior(
    prompt=prompt,
    height=1024,
    width=1024,
    negative_prompt=negative_prompt,
    guidance_scale=6.0,
    num_images_per_prompt=1,
    num_inference_steps=40
)

decoder.enable_model_cpu_offload()
decoder_output = decoder(
    image_embeddings=prior_output.image_embeddings,
    prompt=prompt,
    negative_prompt=negative_prompt,
    guidance_scale=2.0,
    output_type="pil",
    num_inference_steps=10
).images[0]
decoder_output.save("cascade.png")

Training Status:

GPU used for training: 1x AMD RX 7900 XTX 24GB

dataset name training done remaining
newest 002 218
late 002 204
mid 002 199
early 002 053
oldest 002 014
pixiv 002 072
visual novel cg 002 068
anime wallpaper 002 011
Total 24 839

Note: chunks starts from 0 and there are 8000 images per chunk

Dataset:

GPU used for captioning: 1x Intel ARC A770 16GB
Model used for captioning: SmilingWolf/wd-v1-4-convnextv2-tagger-v2

dataset name total images total chunk
newest 1.766.335 221
late 1.652.420 207
mid 1.609.608 202
early 442.368 056
oldest 128.311 017
pixiv 594.046 075
visual novel cg 560.903 071
anime wallpaper 106.882 014
Total 6.860.873 863

Note: Smallest size is 1280x600 | 768.000 pixels

Tags:

aesthetic tags, quality tags, date tags, custom tags, rest of the tags

Date:

tag date
newest 2022 to 2024
late 2019 to 2021
mid 2015 to 2018
early 2011 to 2014
oldest 2005 to 2010

Aesthetic Tags:

Model used: shadowlilac/aesthetic-shadow

score greater than tag
0.980 extremely aesthetic
0.900 very aesthetic
0.750 aesthetic
0.500 slightly aesthetic
0.350 not displeasing
0.250 not aesthetic
0.125 slightly displeasing
0.025 displeasing
rest of them very displeasing

Quality Tags:

Model used: https://huggingface.co/hakurei/waifu-diffusion-v1-4/blob/main/models/aes-B32-v0.pth

score greater than tag
0.980 best quality
0.900 high quality
0.750 great quality
0.500 medium quality
0.250 normal quality
0.125 bad quality
0.025 low quality
rest of them worst quality

Custom Tags:

dataset name custom tag
image boards date,
pixiv art by Display_Name,
visual novel cg Full_VN_Name (short_3_letter_name), visual novel cg,
anime wallpaper date, anime wallpaper,

Training Params:

Software used: Kohya SD-Scripts with Stable Cascade branch
Base model: KBlueLeaf/Stable-Cascade-FP16-fixed

Command:

accelerate launch  --mixed_precision fp16 --num_cpu_threads_per_process 1 stable_cascade_train_stage_c.py \
--mixed_precision fp16 \
--save_precision fp16 \
--full_fp16 \
--sdpa \
--gradient_checkpointing \
--resolution "1024,1024" \
--train_batch_size 2 \
--gradient_accumulation_steps 32 \
--adaptive_loss_weight \
--learning_rate 4e-6 \
--lr_scheduler constant_with_warmup \
--lr_warmup_steps 100 \
--optimizer_type adafactor \
--optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" \
--max_grad_norm 0 \
--token_warmup_min 1 \
--token_warmup_step 0 \
--shuffle_caption \
--caption_dropout_rate 0 \
--caption_tag_dropout_rate 0 \
--caption_dropout_every_n_epochs 0 \
--dataset_repeats 1 \
--save_state \
--save_every_n_steps 128 \
--sample_every_n_steps 32 \
--max_token_length 225 \
--max_train_epochs 1 \
--caption_extension ".txt" \
--max_data_loader_n_workers 2 \
--persistent_data_loader_workers \
--enable_bucket \
--min_bucket_reso 256 \
--max_bucket_reso 4096 \
--bucket_reso_steps 64 \
--bucket_no_upscale \
--log_with tensorboard \
--output_name sotediffusion-sc_3b \
--train_data_dir /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002 \
--in_json /mnt/DataSSD/AI/anime_image_dataset/combined/combined-0002.json \
--output_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2 \
--logging_dir /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-2/logs \
--resume /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1-state \
--stage_c_checkpoint_path /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-1/sotediffusion-sc_3b-1.safetensors \
--effnet_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors \
--previewer_checkpoint_path /mnt/DataSSD/AI/models/sd-cascade/previewer.safetensors \
--sample_prompts /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-prompt.txt

Limitations and Bias

Bias

  • This model is intended for anime illustrations.
    Realistic capabilites are not tested at all.
  • Current version has bias to older anime styles.

Limitations

  • Can fall back to realistic.
    Use "anime illustration" tag to point it into the right direction.
  • Far shot eyes are bad thanks to the heavy latent compression.
Downloads last month
1
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including Disty0/sote-diffusion-cascade_pre-alpha0