File size: 4,946 Bytes
428d54b
 
a0a3a53
 
 
 
428d54b
a0a3a53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3267736
a0a3a53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
license: creativeml-openrail-m
tags:
- stable-diffusion
- anime
- aiart
---

I decide to stop creating a separate repository for each model, so most of the future models will go here.
I will only create repository for more important project.
Despite the name YuriDiffusion, I am not sure whether I will really train such a model.
Dataset collection and the hard limit of SD both make this task very challenging.


### List of models

-[suremio-nozomizo-eilanya-maplesally](https://huggingface.co/alea31415/YuriDiffusion/blob/main/suremio-nozomizo-eilanya-maplesally/README.md)
![native-00001-2487967310](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/suremio-nozomizo-eilanya-maplesally/samples/native-00001-2487967310.png)
![lora-00015-969084934](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/suremio-nozomizo-eilanya-maplesally/samples/lora-00015-969084934.png)
![lora-00071-2365359196](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/suremio-nozomizo-eilanya-maplesally/samples/lora-00071-2365359196.png)![native-00008-4163909665](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/suremio-nozomizo-eilanya-maplesally/samples/native-00008-4163909665.png)
![lora-00017-691849602](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/suremio-nozomizo-eilanya-maplesally/samples/lora-00017-691849602.png)

-[onimai mahiro and mihari](https://huggingface.co/alea31415/YuriDiffusion/blob/main/onimai/README.md)
![00034-2431887953](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/onimai/samples/00034-2431887953.png)
![00026-4010692159](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/onimai/samples/00026-4010692159.png)
![00030-286171376](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/onimai/samples/00030-286171376.png)

-[grass wonder from umamusume](https://huggingface.co/alea31415/YuriDiffusion/blob/main/grasswonder-umamusume/README.md)
![native-00025-570458801](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/grasswonder-umamusume/samples/native-00025-570458801.png)
![lora-00029-4052818765](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/grasswonder-umamusume/samples/lora-00029-4052818765.png)
![lora-00030-4052818765](https://huggingface.co/alea31415/YuriDiffusion/resolve/main/grasswonder-umamusume/samples/lora-00030-4052818765.png)


### Questions that I have partial answer to


**Can we make an image of multiple known characters**

This is possible either through native training, lora, or merging.

**Can we use embedding, lora, and native training together?**

Interestingly, independent trained lora and embedding, or indpendent fine-tuning and embedding go hand in hand.
On the other hand, putting lora with a find-tuned model for that character does not seem to give good result.


### Questions that I do not have answer to

**Native training or LoRA?**

I tried both for [suremio-nozomizo-eilanya-maplesally](https://huggingface.co/alea31415/YuriDiffusion/blob/main/suremio-nozomizo-eilanya-maplesally/README.md)
and [grass wonder from umamusume](https://huggingface.co/alea31415/YuriDiffusion/blob/main/grasswonder-umamusume/README.md) so you can compare them.
Clearly LoRA has certain advantages
- Faster to train
- Lower vram requirement
- Smaller size

Moreover, with the same number of steps LoRA seems to lead to better fidelity if trained with a large learning rate (1e-4).
Nonetheless, this can also be a sort of overfitting, and it is unclear whether we are just trading fidelity for flexibility here.
In fact, I observe applying lora can have a significant impact on the style of the base model.
The problem is whether we can find a better trade-off with a smaller learning rate.

Another advantage that is not inherent to the method is the fact that we can now use LoRA directly with any network.
This should be possible for normal model as well through add difference merging, but unfortunately the current interface that does not support on-the-fly merging.

**Clip skip 1 or 2?**

I played with this in my [EuphiAni model](https://huggingface.co/alea31415/EuphiAni-TenseiOujo) but I cannot really judge whether training with clip skip 1 or 2 is better.
Changing the prompt can always make things more favorable to one model than another.
More surprisingly, I observe that even for model trained with clip skip 2, it may still be better to do inference with clip skip 1.
This is not so much the case for LoRA, which can be explained by the fact that LoRA quickly bias the model towards the training distribution so it is important to match
training and inference.
As for native training, the model still retains its capacity to do inference at clip skip 1.

**Learning rate?**

- For native training something around 1e-6 is good.
- For LoRA the default 1e-4 learns the concept quite fast but there may be some overfitting. On the other hand 1e-5 seems to low. Still need more experience.