Merge branch 'main' of https://huggingface.co/Crosstyan/BPModel
Browse files
README.md
CHANGED
@@ -27,10 +27,28 @@ BPModel is an experimental Stable Diffusion model based on [ACertainty](https://
|
|
27 |
Why is the Model even existing? There are loads of Stable Diffusion model out there, especially anime style models.
|
28 |
Well, is there any models trained with resolution base resolution (`base_res`) 768 even 1024 before? Don't think so.
|
29 |
Here it is, the BPModel, a Stable Diffusion model you may love or hate.
|
30 |
-
Trained with 5k high quality images that suit my taste (not necessary yours unfortunately) from [Sankaku Complex](https://chan.sankakucomplex.com) with annotations.
|
31 |
-
|
32 |
-
|
33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
[Mikubill/naifu-diffusion](https://github.com/Mikubill/naifu-diffusion) is used as training script and I also recommend to
|
36 |
checkout [CCRcmcpe/scal-sdt](https://github.com/CCRcmcpe/scal-sdt).
|
@@ -85,7 +103,7 @@ better than some artist style DreamBooth model which only train with a few
|
|
85 |
hundred images or even less. I also oppose changing style by merging model since You
|
86 |
could apply different style by training with proper captions and prompting.
|
87 |
|
88 |
-
Besides some of images in my dataset
|
89 |
be misinterpreted by CLIP when tokenizing. For example, *as109* will be tokenized as `[as, 1, 0, 9]` and
|
90 |
*fuzichoco* will become `[fu, z, ic, hoco]`. Romanized Japanese suffers from the problem a lot and
|
91 |
I don't have a good solution to fix it other than changing the artist name in the caption, which is
|
@@ -101,6 +119,10 @@ I don't think anyone would like to do. (Could Unstable Diffusion give us surpris
|
|
101 |
|
102 |
Here're some **cherry picked** samples.
|
103 |
|
|
|
|
|
|
|
|
|
104 |
![orange](images/00317-2017390109_20221220015645.png)
|
105 |
|
106 |
```txt
|
@@ -154,7 +176,9 @@ EMA weight is not included and it's fp16.
|
|
154 |
If you want to continue training, use [`bp_1024_e10_ema.ckpt`](bp_1024_e10_ema.ckpt) which is the ema unet weight
|
155 |
and with fp32 precision.
|
156 |
|
157 |
-
For better performance, it is strongly recommended to use Clip skip (CLIP stop at last layers) 2.
|
|
|
|
|
158 |
|
159 |
## About the Model Name
|
160 |
|
|
|
27 |
Why is the Model even existing? There are loads of Stable Diffusion model out there, especially anime style models.
|
28 |
Well, is there any models trained with resolution base resolution (`base_res`) 768 even 1024 before? Don't think so.
|
29 |
Here it is, the BPModel, a Stable Diffusion model you may love or hate.
|
30 |
+
Trained with 5k high quality images that suit my taste (not necessary yours unfortunately) from [Sankaku Complex](https://chan.sankakucomplex.com) with annotations.
|
31 |
+
The dataset is public in [Crosstyan/BPDataset](https://huggingface.co/datasets/Crosstyan/BPDataset) for the sake of full disclosure .
|
32 |
+
Pure combination of tags may not be the optimal way to describe the image,
|
33 |
+
but I don't need to do extra work.
|
34 |
+
And no, I won't feed any AI generated image
|
35 |
+
to the model even it might outlaw the model from being used in some countries.
|
36 |
+
|
37 |
+
The training of a high resolution model requires a significant amount of GPU
|
38 |
+
hours and can be costly. In this particular case, 10 V100 GPU hours were spent
|
39 |
+
on training 30 epochs with a resolution of 512, while 60 V100 GPU hours were spent
|
40 |
+
on training 30 epochs with a resolution of 768. An additional 100 V100 GPU hours
|
41 |
+
were also spent on training a model with a resolution of 1024, although **ONLY** 10
|
42 |
+
epochs were run. The results of the training on the 1024 resolution model did
|
43 |
+
not show a significant improvement compared to the 768 resolution model, and the
|
44 |
+
resource demands, achieving a batch size of 1 on a V100 with 32G VRAM, were
|
45 |
+
high. However, training on the 768 resolution did yield better results than
|
46 |
+
training on the 512 resolution, and it is worth considering as an option. It is
|
47 |
+
worth noting that Stable Diffusion 2.x also chose to train on a 768 resolution
|
48 |
+
model. However, it may be more efficient to start with training on a 512
|
49 |
+
resolution model due to the slower training process and the need for additional
|
50 |
+
prior knowledge to speed up the training process when working with a 768
|
51 |
+
resolution.
|
52 |
|
53 |
[Mikubill/naifu-diffusion](https://github.com/Mikubill/naifu-diffusion) is used as training script and I also recommend to
|
54 |
checkout [CCRcmcpe/scal-sdt](https://github.com/CCRcmcpe/scal-sdt).
|
|
|
103 |
hundred images or even less. I also oppose changing style by merging model since You
|
104 |
could apply different style by training with proper captions and prompting.
|
105 |
|
106 |
+
Besides some of images in my dataset have the artist name in the caption, however some artist name will
|
107 |
be misinterpreted by CLIP when tokenizing. For example, *as109* will be tokenized as `[as, 1, 0, 9]` and
|
108 |
*fuzichoco* will become `[fu, z, ic, hoco]`. Romanized Japanese suffers from the problem a lot and
|
109 |
I don't have a good solution to fix it other than changing the artist name in the caption, which is
|
|
|
119 |
|
120 |
Here're some **cherry picked** samples.
|
121 |
|
122 |
+
I were using [xformers](https://github.com/facebookresearch/xformers) when generating these sample
|
123 |
+
and it might yield slight different result even with the same seed (welcome to the non deterministic field).
|
124 |
+
"`Upscale latent space image when doing hires. fix`" is enabled also.
|
125 |
+
|
126 |
![orange](images/00317-2017390109_20221220015645.png)
|
127 |
|
128 |
```txt
|
|
|
176 |
If you want to continue training, use [`bp_1024_e10_ema.ckpt`](bp_1024_e10_ema.ckpt) which is the ema unet weight
|
177 |
and with fp32 precision.
|
178 |
|
179 |
+
For better performance, it is strongly recommended to use Clip skip (CLIP stop at last layers) 2. It's also recommended to use turn on
|
180 |
+
"`Upscale latent space image when doing hires. fix`" in the settings of [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
|
181 |
+
which adds intricate details when using `Highres. fix`.
|
182 |
|
183 |
## About the Model Name
|
184 |
|