V-Prediction Loss Weighting Test

Notice

This repository contains personal experimental records. No guarantees are made regarding accuracy or reproducibility.
These models are for verification purposes only and is not intended for general use.

Overview

This repository is a test project comparing different loss weighting schemes for Stable Diffusion v-prediction training.

Environment

sd-scripts dev branch
- Commit hash: [6adb69b] + Modified

Test Cases

This repository includes test models using different weighting schemes:

test_normal_weight
- Baseline model using standard weighting
test_edm2_weighting
- New loss weighting scheme
- implementation by A
test_min_snr_1
- Baseline model with --min_snr_gamma = 1
test_debias_scale-like
- Baseline model with additional parameters:
  - --debiased_estimation_loss
  - --scale_v_pred_loss_like_noise_pred
test_edm2_weight_new
- New loss weighting scheme
- Implementation by madman404

Training Parameters

For detailed parameters, please refer to the .toml files in each model directory. Each model uses sdxl_train.py in each model directory (and sdxl_train.py and t.py for test_edm2_weighting, sdxl_train.py andlossweightMLP.py for test_edm2_weight_new)

Common parameters:

Samples: 57,373
Epochs: 3
U-Net only
Learning rate: 3.5e-6
Batch size: 8
Gradient accumulation steps: 4
Optimizer: Adafactor (stochastic rounding)
Training time: 13.5 GPU hours (RTX4090) per trial

Dataset Information

The dataset used for testing consists of:

~53,000 images extracted from danbooru2023 based on specific artist styles (approximately 300 artists)
~4,000 carefully selected danbooru images for standardization

Note: As this dataset is a subset of my regular training data focused on specific artists, the model's generalization might be limited. A wildcard file (wildcard_style.txt) containing the list of included artists is provided for reference.

Tag Format

The training follows the tag format from Kohaku-XL-Epsilon: <1girl/1boy/1other/...>, <character>, <series>, <artists>, <general tags>, <quality tags>, <year tags>, <meta tags>, <rating tags>

Style Prompts

The following style prompts from Kohaku-XL-Epsilon might be compatible (untested):

ask \(askzy\), torino aqua, migolu, (jiu ye sang:1.1), (rumoon:0.9), (mizumi zumi:1.1)

ciloranko, maccha \(mochancc\), lobelia \(saclia\), migolu, 
ask \(askzy\), wanke, (jiu ye sang:1.1), (rumoon:0.9), (mizumi zumi:1.1)

shiro9jira, ciloranko, ask \(askzy\), (tianliang duohe fangdongye:0.8)

(azuuru:1.1), (torino aqua:1.2), (azuuru:1.1), kedama milk, 
fuzichoco, ask \(askzy\), chen bin, atdan, hito, mignon

ask \(askzy\), torino aqua, migolu

This model card was written with the assistance of Claude 3.5 Sonnet.

kawaimasa
/

eps_to_vpred_test_from_noobAI1