File size: 5,824 Bytes
3055717
 
 
1bbde23
 
 
 
 
 
 
964a2f0
1bbde23
 
 
 
964a2f0
1bbde23
 
964a2f0
1bbde23
 
 
 
 
964a2f0
 
 
 
 
1bbde23
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a2f10a
1bbde23
 
 
 
1a2f10a
1bbde23
 
 
 
964a2f0
 
 
 
 
1bbde23
 
 
 
 
964a2f0
 
 
 
1bbde23
 
964a2f0
 
 
 
 
 
 
 
1bbde23
 
964a2f0
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: creativeml-openrail-m
---

---
license: creativeml-openrail-m
---

This is a low-quality bocchi-the-rock (ぼっち・ざ・ろっく!) character model.
Similar to my [yama-no-susume model](https://huggingface.co/alea31415/yama-no-susume), this model is capable of generating **multi-character scenes** beyond images of a single character.
Of course, the result is still hit-or-miss, but I with some chance you can get the entire Kessoku Band right in one shot,
and otherwise, you can always rely on inpainting.
Here are two examples:

With inpainting
![4265343062-1047638199](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/with_inpaint/4265343062-1047638199.png)

Without inpainting
![4265343086-2648280139](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/without_inpaint/4265343086-2648280139.png)


### Characters

The model knows 12 characters from bocchi the rock.
The ressemblance with a character can be improved by a better description of their appearance (for example by adding long wavy hair to ShimizuEliza).

![xy_grid-0028-24](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/grids/xy_grid-0028-24.jpg)
![xy_grid-0029-24](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/grids/xy_grid-0029-24.jpg)
![xy_grid-0030-24](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/grids/xy_grid-0030-24.jpg)


### Dataset description

The dataset contains around 27K images with the following composition
- 7024 anime screenshots
- 1630 fan arts
- 18519 customized regularization images

The model is trained with a specific weighting scheme to balance between different concepts.
For example, the above three categories have weights respectively 0.3, 0.25, and 0.45.
Each category is itself split into many sub-categories in a hierarchical way.
For more details on the data preparation process please refer to https://github.com/cyber-meow/anime_screenshot_pipeline


### Training Details

#### Trainer
The model is trained using [EveryDream1](https://github.com/victorchall/EveryDream-trainer) as
EveryDream seems to be the only trainer out there that supports sample weighting (through the use of `multiply.txt`).
Note that for future training it makes sense to migrate to [EveryDream2](https://github.com/victorchall/EveryDream2trainer).

#### Hardware and cost
The model is trained on runpod using 3090 and cost me around 15 dollors.

#### Hyperparameter specification

The model is trained for 50000 steps, at batch size 4, lr 1e-6, resolution 512, and conditional dropping rate of 10%.

Note that as a consequence of the weighting scheme which translates into a number of different multiply for each image,
the count of repeat and epoch has a quite different meaning here.
For example, depending on the weighting, I have around 300K images (some images are used multiple times) in an epoch,
and therefore I did not even finish an entire epoch with the 50000 steps at batch size 4.

### Failures

- For the first 24000 steps I use the trigger words `Bfan1` and `Bfan2` for the two fans of Bocchi.
  However, these two words are too similar and the model fails to different characters for these.
  Therefore I changed Bfan2 to Bofa2 at step 24000. This seemed to solve the problem.
- Character blending is always an issue.
- When prompting the four characters of Kessoku Band we often get side shots.
  I think this is because of some overfitting to a particular image.


### More Example Generations

With inpainting
![4265343068-2420755431](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/with_inpaint/4265343068-2420755431.png)
![4265343066-3979275255](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/with_inpaint/4265343066-3979275255.png)
![4265343022-3534836762](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/with_inpaint/4265343022-3534836762.png)


Without inpainting
![4265343092-803155289](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/without_inpaint/4265343092-803155289.png)
![4265343053-918713189](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/without_inpaint/4265343053-918713189.png)
![4265343054-2839948768](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/without_inpaint/4265343054-2839948768.png)
![4265343096-399054050](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/without_inpaint/4265343096-399054050.png)
![4265343100-3858388158](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/without_inpaint/4265343100-3858388158.png)
![4265343016-2842516738](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/without_inpaint/4265343016-2842516738.png)
![4265343084-3548261345](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/without_inpaint/4265343084-3548261345.png)
![4265343083-1372779456](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/without_inpaint/4265343083-1372779456.png)

Some failure cases
![4265343089-2940163958](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/failure/4265343089-2940163958.png)
![4265343091-129639375](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/failure/4265343091-129639375.png)
![4265343048-2869643584](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/failure/4265343048-2869643584.png)
![4265343039-1470057774](https://huggingface.co/alea31415/bocchi-the-rock-character/resolve/main/examples/failure/4265343039-1470057774.png)