|
--- |
|
license: other |
|
license_name: fair-ai-public-license-1.0-sd |
|
license_link: https://freedevproject.org/faipl-1.0-sd/ |
|
language: |
|
- en |
|
base_model: |
|
- Laxhar/noobai-XL-1.0 |
|
pipeline_tag: text-to-image |
|
library_name: diffusers |
|
tags: |
|
- safetensors |
|
- diffusers |
|
- stable-diffusion |
|
- stable-diffusion-xl |
|
--- |
|
# V-Prediction Loss Weighting Test |
|
|
|
## Notice |
|
This repository contains personal experimental records. No guarantees are made regarding accuracy or reproducibility. |
|
**These models are for verification purposes only and is not intended for general use.** |
|
## Overview |
|
This repository is a test project comparing different loss weighting schemes for Stable Diffusion v-prediction training. |
|
|
|
## Environment |
|
- [sd-scripts](https://github.com/kohya-ss/sd-scripts) dev branch |
|
- Commit hash: [6adb69b] + Modified |
|
|
|
## Test Cases |
|
|
|
This repository includes test models using different weighting schemes: |
|
|
|
1. **test_normal_weight** |
|
- Baseline model using standard weighting |
|
|
|
2. **test_edm2_weighting** |
|
- New loss weighting scheme |
|
- implementation by A |
|
|
|
3. **test_min_snr_1** |
|
- Baseline model with `--min_snr_gamma = 1` |
|
|
|
4. **test_debias_scale-like** |
|
- Baseline model with additional parameters: |
|
- `--debiased_estimation_loss` |
|
- `--scale_v_pred_loss_like_noise_pred` |
|
|
|
5. **test_edm2_weight_new** |
|
- New loss weighting scheme |
|
- Implementation by madman404 |
|
|
|
## Training Parameters |
|
For detailed parameters, please refer to the `.toml` files in each model directory. |
|
Each model uses sdxl_train.py in each model directory |
|
(and sdxl_train.py and t.py for test_edm2_weighting, sdxl_train.py andlossweightMLP.py for test_edm2_weight_new) |
|
|
|
Common parameters: |
|
- Samples: 57,373 |
|
- Epochs: 3 |
|
- U-Net only |
|
- Learning rate: 3.5e-6 |
|
- Batch size: 8 |
|
- Gradient accumulation steps: 4 |
|
- Optimizer: Adafactor (stochastic rounding) |
|
- Training time: 13.5 GPU hours (RTX4090) per trial |
|
|
|
## Dataset Information |
|
The dataset used for testing consists of: |
|
- ~53,000 images extracted from danbooru2023 based on specific artist styles (approximately 300 artists) |
|
- ~4,000 carefully selected danbooru images for standardization |
|
|
|
**Note**: As this dataset is a subset of my regular training data focused on specific artists, the model's generalization might be limited. A wildcard file (wildcard_style.txt) containing the list of included artists is provided for reference. |
|
|
|
### Tag Format |
|
The training follows the tag format from [Kohaku-XL-Epsilon](https://huggingface.co/KBlueLeaf/Kohaku-XL-Epsilon): |
|
`<1girl/1boy/1other/...>, <character>, <series>, <artists>, <general tags>, <quality tags>, <year tags>, <meta tags>, <rating tags>` |
|
|
|
### Style Prompts |
|
The following style prompts from Kohaku-XL-Epsilon might be compatible (untested): |
|
``` |
|
ask \(askzy\), torino aqua, migolu, (jiu ye sang:1.1), (rumoon:0.9), (mizumi zumi:1.1) |
|
``` |
|
``` |
|
ciloranko, maccha \(mochancc\), lobelia \(saclia\), migolu, |
|
ask \(askzy\), wanke, (jiu ye sang:1.1), (rumoon:0.9), (mizumi zumi:1.1) |
|
``` |
|
``` |
|
shiro9jira, ciloranko, ask \(askzy\), (tianliang duohe fangdongye:0.8) |
|
``` |
|
``` |
|
(azuuru:1.1), (torino aqua:1.2), (azuuru:1.1), kedama milk, |
|
fuzichoco, ask \(askzy\), chen bin, atdan, hito, mignon |
|
``` |
|
``` |
|
ask \(askzy\), torino aqua, migolu |
|
``` |
|
|
|
|
|
*This model card was written with the assistance of Claude 3.5 Sonnet.* |