|
--- |
|
pipeline_tag: text-to-image |
|
license: other |
|
license_name: stable-cascade-nc-community |
|
license_link: LICENSE |
|
--- |
|
|
|
# SoteDiffusion Cascade |
|
|
|
Anime finetune of Stable Cascade Decoder. |
|
No commercial use thanks to StabilityAI. |
|
|
|
## Code Example |
|
|
|
```shell |
|
pip install diffusers |
|
``` |
|
|
|
```python |
|
import torch |
|
from diffusers import StableCascadeDecoderPipeline, StableCascadePriorPipeline |
|
|
|
prompt = "newest, 1girl, solo, cat ears, looking at viewer, blush, light smile," |
|
negative_prompt = "very displeasing, worst quality, monochrome, sketch, fat, child," |
|
|
|
prior = StableCascadePriorPipeline.from_pretrained("Disty0/sote-diffusion-cascade_alpha0", torch_dtype=torch.float16) |
|
decoder = StableCascadeDecoderPipeline.from_pretrained("Disty0/sote-diffusion-cascade-decoder_alpha0", torch_dtype=torch.float16) |
|
|
|
prior.enable_model_cpu_offload() |
|
prior_output = prior( |
|
prompt=prompt, |
|
height=1024, |
|
width=1024, |
|
negative_prompt=negative_prompt, |
|
guidance_scale=7.0, |
|
num_images_per_prompt=1, |
|
num_inference_steps=40 |
|
) |
|
|
|
decoder.enable_model_cpu_offload() |
|
decoder_output = decoder( |
|
image_embeddings=prior_output.image_embeddings, |
|
prompt=prompt, |
|
negative_prompt=negative_prompt, |
|
guidance_scale=1.5 |
|
output_type="pil", |
|
num_inference_steps=10 |
|
).images[0] |
|
decoder_output.save("cascade.png") |
|
``` |
|
|
|
## Dataset |
|
|
|
Used the same dataset as Disty0/sote-diffusion-cascade-decoder_pre-alpha0. |
|
Trained with 98K~ images. |
|
|
|
## Training: |
|
|
|
**GPU used for training**: 1x AMD RX 7900 XTX 24GB |
|
|
|
**Software used**: https://github.com/2kpr/StableCascade |
|
|
|
### Config: |
|
``` |
|
experiment_id: sotediffusion-sc-b_3b |
|
model_version: 3B |
|
dtype: bfloat16 |
|
use_fsdp: False |
|
|
|
batch_size: 1 |
|
grad_accum_steps: 1 |
|
updates: 98000 |
|
backup_every: 2048 |
|
save_every: 1024 |
|
warmup_updates: 100 |
|
|
|
lr: 4.0e-6 |
|
optimizer_type: Adafactor |
|
adaptive_loss_weight: True |
|
stochastic_rounding: True |
|
|
|
image_size: 768 |
|
multi_aspect_ratio: [1/1, 1/2, 1/3, 2/3, 3/4, 1/5, 2/5, 3/5, 4/5, 1/6, 5/6, 9/16] |
|
shift: 4 |
|
|
|
checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/ |
|
output_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/ |
|
webdataset_path: file:/mnt/DataSSD/AI/anime_image_dataset/best/newest_best-{0000..0001}.tar |
|
|
|
effnet_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/effnet_encoder.safetensors |
|
stage_a_checkpoint_path: /mnt/DataSSD/AI/models/sd-cascade/stage_a.safetensors |
|
generator_checkpoint_path: /mnt/DataSSD/AI/SoteDiffusion/StableCascade/sotediffusion-sc_3b-stage_b.safetensors |
|
``` |
|
|
|
|
|
## Limitations and Bias |
|
|
|
### Bias |
|
|
|
- This model is intended for anime illustrations. |
|
Realistic capabilites are not tested at all. |
|
|
|
### Limitations |
|
- Far shot eyes are still bad thanks to the heavy latent compression. |
|
|