How I train a LoRA: m3lt style training overview

Community Article Published July 1, 2024

This article will take a step by step approach to outlining the method that I used to train the 'm3lt' lora model. I'll provide the input images (synthetically generated) and rely on automatically generated captions, to show the importance of images and good parameters. To train I've used LoRA Ease by multimodalart

Thank you to multimodalart and the HF team!

Getting a training environment set up that was approachable, free to access for those with local GPUs, and met my ideal settings was a near impossible task. I have to really thank the Huggingface team, in particular multimodalart (apolinario) for creating LoRA Ease and making the adjustments needed so that I could identify the perfect parameters. Open source AI is going to continue to change the world!

General notes

I personally do not think you can quickly learn LoRA training from a tutorial - I think the ideal situation would be to learn it from another person, because it requires a mixed art and pattern recognition understanding. That is why I rarely write "here is how to do X" style workflows for LoRA training.

If this is your first time trying a LoRA - it would be wise to use a few LoRAs first with a model so that when you test your model, you know what good results look like.

If this is your first time finetuning a model - expect that you might need to retrain it a couple times to get what you want. I have trained probably thousands of LoRAs and I still retrain some 2-3 times to get the right output.

Model Info

Name m3lt
Type Style
Dataset/Captions Synthetic/AI-Generated
Style Category Illustration
Trainer LoRA Ease by multimodalart

Notes

My goal with artistic LoRAs is to create a new style, and thus I do sometimes blend styles such as in this workflow.
It is possible to achieve even more nuance and accuracy in creating a particular style.
By leveraging captions using some of the suggestions I've made in past articles, you can be more accurate with your stylization.
Different styles require adjustments to parameters, in my experience. I recommend you use this as a jumping off point, and encourage you to make your own discoveries. I plan to do other workflows in the future to exemplify the difference.
If you would prefer to run a script locally, you can use this link to set it up.

Assembling Training Data

Possibly the most important step in creation of your model starts with dataset curation. In the case of this model, I chose 54 images. You'll note that there is a range of style elements, with some common themes. This was intentional - I wanted to craft a style that had the ability to have a wide range of color, not overfitting on the more muted tones of the portraits, as well as create a sketchy quality in the linework.

Typically, I tend to keep my training datasets between 20-30 images. Because this style was intentionally aimed at having a wider array of style concepts present, I pushed the dataset to a larger size.

Captioning

In this case of this model, I wanted to show the strength of a good dataset and parameters, so I left the captions to the autocaption output on LoRA Ease. I could also leverage captions to engineer certain results, which I may show in a future article.

Unique Token

I used the unique token provided by LoRA Ease. However, the particular parameters I'm using here tend not to always need a unique token.

The style certainly comes through without the token here, using the prompt "woman."

However, there's no doubt that the style shifts and solidifies slightly with the inclusion of "style of TOK" (same seed). We also see more depth of field and detail. This could be manipulated further with more advanced captioning techniques during training.

Training Parameters

For LoRA Ease I used the following training parameters. I am also testing an alternative "slow" training on a lower learning rate, which is important for nuanced styles such as photography. I'll share that preset in a different article.

When you use LoRA Ease, feel free to select the Style preset for this workflow.

Advanced Options

Optimizer AdamW
Use SNR Gamma Yes
SNR Gamma Weight 5
Mixed Precision bg16
UNet LR 0.0005
Max Training Steps 3000[^1]
LoRA Rank 32
Repeats 12
Prior Pres Loss Off
TI Embedding Training Off[^2]
TE % 0.2
TE Learning Rate 0.00005
Resolution 1024
Seed 42 is fine, whatever works best for you.

[^1]: I calculate max training steps by multiplying number of training images by steps per image. Typically, I want between 50-75 steps per image when using an UNet LR of 1e-4 to 5e-4. I increase the steps when I train at a lower overall LR.

[^2]: I find that TI trains the style well but overfits the concepts. I'll explore this in more depth with some concept specific training runs.

Even More Advanced Options

gradient_accumulation_steps 1
Train Batch Size 4
num_train_epochs 1
prior_loss_weight 1
gradient_checkpointing On
adam_beta1 0.9
adam_beta2 0.999
Scale Learning Rate Off
lr_scheduler cosine_with_restarts
lr_power 1
lr_warmup_steps 0
Dataloader num workers 0
local_rank -1
Use Prodigy Beta 3 No
Adam Weight Decay 0.0001
Use Adam Weight Decay Text Encoder No
Adam Weight Decay Text Encoder 0
Adam Epsilon 1e-8
Prodigy Use Bias Correction No
Prodigy Safeguard Warmup No
Max Grad Norm 1
enable_xformers_memory_efficient_attention Off

With these settings you should be able to train styles fairly quickly. I tend to recommend, if you find that you are overfitting, to reduce the UNet training steps by 100-200. At a -e4 learning rate, small step changes can make a big impact, as opposed to slower -e5 LRs.

If you find the style is training well, but the concepts are overfitting, reduce the TE percentage to 0.15 or 0.1. In this case you also may need to increase the UNet learning steps by 100-200 steps. If the issue persists, I recommend revisiting your dataset and ensuring that your images aren't repeats, or simply too consistent.

If you are underfitting, try to fix it first by incrementally increasing the overall step count by 100 steps at a time. Typically it is better to overfit on a LoRA training than underfit, because you can balance out by reducing the LoRA strength during inference.

Results

I wanted to share some examples of the final outputs for this model, to show you the range these parameters created. For inference settings, I always use the model that the style was trained on (this is crucial for good results in my opinion) and the following inference parameters:

a woman, style of TOK

a robot flying through the sky with smoke billowing behind it

a woman with blonde-brown hair, white t shirt, black linen shorts

a woman with blonde-brown hair, white t shirt, black linen shorts, anime style illustration, big round glasses, style of TOK

a meatball sub, style of TOK

Final Observations

Different prompts relate to the unique token in different ways. This could be solved with a mixed caption style, which would put less importance on prompt syntax and the relationship between the unique prompt and the concepts in each image.
The most variety is certainly in the style of women and girl created with the prompts. This is no surprise, given the range in the dataset, and they respond well to different stylized prompts.
Concepts not at all present in the original dataset often require the unique prompt. This could be adjusted by making changes to the captioning style.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote