pipeline_tag: text-to-image
license: other
license_name: faipl-1.0-sd
license_link: https://freedevproject.org/faipl-1.0-sd/
language:
- en
library_name: diffusers
base_model: Laxhar/noobai-XL-Vpred-0.6
The LHC (Large Heap o' Chuubas) is aiming to be the model for all of your VTuber needs.
The What, The Why, The How
The What
The LHC models are a series of Vtuber centric finetunes. As opposed to many smallscale finetunes, where the aim is to improve aesthetics or a general concept like backgrounds, the aim of the LHC models is primarily to add specific characters while preserving as much of the base model as possible.
The Why
The usual way of adding characters to an already trained model is LoRAs or similar methods, where you end up with a small model that can be applied to existing models, adding concepts in a very plug and play way. While this is a very convenient way of achieving this, that most modern consumer GPUs are capable of training, they come with several downsides to a model that has been fully trained on these concepts.
- Loras will have an effect on composition, style and character knowledge even outside of their intended concept. This will be especially apparent in incorrectly trained loras, where the concept will always be applied, causing poses to stiffen or characters taking on attributes of the new character. While this bleeding can and does happen on finetuned models as well, the effect is drastically smaller.
- Full finetunes usually result in the model being very capable of abstracting, meaning even if a specific combination of concepts isn't present in the training data, a well trained finetune will be able to combine these concepts in an almost logical way.
- As a specific example of 2.: while style and concept Loras usually work quite well when used together with other Loras of their type, character Loras tend to be much less capable of being used together with other character Loras for generating images of multiple new characters. While there are ways to make this work, and some Loras are more capable than others, this also limits the results in my experience. One workaround to this is training a Lora on multiple characters as well as images of these characters together, which does work quite well in my experience, however finding artwork of specific characters together is not always possible and when adding more and more characters, the one must also increased the size of the resulting Lora in order to learn more and more characters. This make Loras less and less effective the more characters one wants to add at once.
- It is possible to extract a Lora from a model, meaning the things the finetune learned can be applied to models that have a similar base. This means that even if the resulting finetune doesn't work for every application, the extract can profit from most of the advantages of a Lora.
These factors make a fullscale finetune the best option for adding large amounts of characters to a model.
The How
Thanks to optimizations to model training and technologies like Gradient Accumulation, it is possible to effectively finetune models even on normal consumer hardware like a single RTX 3090, as long as one has enough time. Additionally, by manually curating datasets of the different characters and using repeats to allow for a similar distribution between them, one can ensure a more balanced learning process.