## Training Parameters ### flexible: dataset construction information was lost. Likely resulted in around 600 to 1.4k steps. ``` $learning_rate = 3e-5 $lr_warmup_ratio = 0.10 $train_batch_size = 4 $num_epochs = 1 $save_every_n_epochs=1 $scheduler="constant_with_warmup" $network_dim=128 ``` ### strict: Used three categories of images: full body original outfit, original outfight thigh up, alternative costume images. Explicitly prune images that included goggles around the neck because previous training attempts resulted in models where a brown suggestion of a goggles would show up often even when not prompted for. ``` amber full body: 10 repeats * 31 images = 310 good amber: 30 repeats * 7 images = 210 amber generic: 8 repeats * 35 images = 280 Total images with repeats: 800 Max training steps 800 / 4 * 5 = 1000 Warmup steps: 100, Batch size: 4, Epochs: 5 last epoch was chosen for this model. ``` ``` $learning_rate = 7e-5 $lr_warmup_ratio = 0.10 $train_batch_size = 4 $num_epochs = 5 $save_every_n_epochs=1 $scheduler="cosine_with_restarts" $network_dim=128 ``` ## Points of failure: Overfitting of the gold pattern on her stomach area was a common issue accross all models when trying to train enough to capture original outfit. Underfitting of the shoes was also common. Usually the gold trim is wrong, the red inner sock portion would be missing or the shoe would be entirely red or brown or the wrong length. Goggles affected image quality a lot because they would leave traces of brown smudges in some generations when it was not specified. This is likely fixable with longer training period, more examples and lower training rates. For the strict model, we chose to remove images that included goggles around the neck to avoid it. ## Considerations for the future: Consider crops for the legs so it can learn the shoes better since the details around that is one of the things it gets wrong the most. Quite often we will reach a point where all the other details are trained in well but the shoes will come out the wrong color (likely because the word boot is associated with brown boots). Many art pieces do not agree on the finer details of the gold trim of the boots and thus the model cannot learn something accurate. One possible solution is to generate many images including the boots and chosing the ones with a more correct pattern to be used as training images for future revisions. ## Learnings: You can very quickly learn a character's likeness by overfitting with a high training rate like 1e-4 and training only on original outfit images. As little as 200 steps with batch size 4 was enough. Learning rates below 2e-5 were ineffective at learning all the required details (if your goal is to finish training in under 30 minutes). Learning rate scheduler is not something really worth changing other than for using cosine with restarts and many epochs. It is usually better to just change learning rate or adjust the training data.