--- language: - en tags: - stable-diffusion - text-to-image license: bigscience-bloom-rail-1.0 inference: false --- project that probably won't lead to anything useful but is still interesting (Less VRAM requirement than finetuning Stable Diffusion, faster if you have all the images downloaded, less space taken up by the models since you only need CLIP) a notebook for producing your own "stable inversions" is included in this repo but I wouldn't recommend doing so (they suck). It works on Colab free tier though. [link to notebook for you to download](https://huggingface.co/crumb/genshin-stable-inversion/blob/main/stable_inversion%20(1).ipynb) how you can load this into a diffusers-based notebook like [Doohickey](https://github.com/aicrumb/doohickey) might look something like this ```python from huggingface_hub import hf_hub_download stable_inversion = "user/my-stable-inversion" #@param {type:"string"} inversion_path = hf_hub_download(repo_id=stable_inversion, filename="token_embeddings.pt") text_encoder.text_model.embeddings.token_embedding.weight = torch.load(inversion_path) ``` it was trained on 1024 images matching the 'genshin_impact' tag on safebooru, epochs 1 and 2 had the model being fed the full captions, epoch 3 had 50% of the tags in the caption, and epoch 4 had 25% of the tags in the caption. Learning rate was 1e-3 and the loss curve looked like this ![](https://pbs.twimg.com/media/FdsdivkWIBQYmZd?format=jpg&name=small) Samples from this finetuned inversion for the prompt "beidou_(genshin_impact)" using just the 1-4 Stable Diffusion model ![](https://pbs.twimg.com/media/FdsdL7YWIBEXoqG?format=png&name=small) ![](https://pbs.twimg.com/media/FdsdSUdWICIBarc?format=png&name=small) ![](https://pbs.twimg.com/media/Fdsdte5WIAAUALp?format=png&name=small) ![](https://pbs.twimg.com/media/FdseBVrWIAgLBxM?format=png&name=small) Sample for the same prompt BEFORE finetuning (matches seeds with first finetuned sample) ![](https://pbs.twimg.com/media/FdoL0j1WQAoXbef?format=png&name=small)