JuniorDerp
commited on
Commit
•
28d43e9
1
Parent(s):
0f657bd
Add model and model card
Browse files- README.md +94 -0
- config.json +37 -0
- diffusion_pytorch_model.safetensors +3 -0
README.md
ADDED
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc0-1.0
|
3 |
+
datasets:
|
4 |
+
- nyanko7/danbooru2023
|
5 |
+
- boxingscorpionbagel/e621-2024
|
6 |
+
library_name: diffusers
|
7 |
+
---
|
8 |
+
# LibreVAE
|
9 |
+
|
10 |
+
LibreVAE is a Variational Autoencoder designed to serve as a component for future generative modelling projects. It has 8 latent channels, and reduces images dimensionally by a factor of 8. It was trained using HuggingFace Diffusers, and can be loaded with the `AutoencoderKL` class.
|
11 |
+
|
12 |
+
## Example Usage
|
13 |
+
```python
|
14 |
+
from diffusers import AutoencoderKL
|
15 |
+
from PIL import Image
|
16 |
+
from torchvision import transforms
|
17 |
+
|
18 |
+
transform_image = transforms.Compose([
|
19 |
+
transforms.Lambda(lambda x: x.convert("RGB")),
|
20 |
+
transforms.ToTensor(),
|
21 |
+
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
|
22 |
+
])
|
23 |
+
|
24 |
+
untransform_image = transforms.Compose([
|
25 |
+
transforms.Normalize((-1, -1, -1), (2, 2, 2)),
|
26 |
+
transforms.ToPILImage()
|
27 |
+
])
|
28 |
+
|
29 |
+
model = AutoencoderKL.from_pretrained("scrumptious/librevae-f8-d8").to("cuda").eval()
|
30 |
+
model.requires_grad_(False)
|
31 |
+
|
32 |
+
# For a 512x512 image, image_tensor will be (1, 8, 64, 64)
|
33 |
+
image_tensor = transform_image(Image.open("sample.png")).unsqueeze(0).to("cuda")
|
34 |
+
# latent will be (1, 8, 64, 64)
|
35 |
+
# we multiply it by the scaling factor so it has an approximate mean of 0 and variance of 1
|
36 |
+
latent = model.encode(image_tensor).latent_dist.sample() * model.config.scaling_factor
|
37 |
+
# output will be (1, 3, 512, 512)
|
38 |
+
output = model.decode(latent / model.config.scaling_factor).sample
|
39 |
+
output_image = untransform_image(output.squeeze(0).clamp(-1, 1).cpu())
|
40 |
+
|
41 |
+
output_image.save('sample_decoded.png')
|
42 |
+
```
|
43 |
+
|
44 |
+
## Training Details
|
45 |
+
|
46 |
+
### Training Datasets
|
47 |
+
|
48 |
+
LibreVAE was trained on the e621-2024 and danbooru2023 datasets, both of which are large, curated collections of artwork. While the model was trained primarily on artwork, our testing showed that it was capable of working with other types of images.
|
49 |
+
|
50 |
+
### Dataset Preprocessing
|
51 |
+
|
52 |
+
We applied a modified version of [NovelAI's Aspect Ratio Bucketing](https://github.com/NovelAI/novelai-aspect-ratio-bucketing) to the images, where we dynamically selected aspect ratio buckets using K-Means over our training dataset instead of predetermining them based on fixed sizes. We then set sizes for these buckets to be around 256x256.
|
53 |
+
|
54 |
+
### Loss Function
|
55 |
+
|
56 |
+
The loss function for this model was MSE\_lab + (0.5 \* MSE\_rgb) + (0.1 \* LPIPS) + (1e-4 \* KL), where MSE\_lab was the mean squared error calculated in CIELAB color space, MSE\_rgb was the mean squared error calculated in RGB color space, LPIPS was the LPIPS loss, and KL was the KL divergence.
|
57 |
+
|
58 |
+
### Other Details
|
59 |
+
|
60 |
+
- **Precision:** BF16 mixed precision
|
61 |
+
- **Learning Rate:** 1e-4 with a 50% decay per epoch
|
62 |
+
- **Epochs:** 2
|
63 |
+
- **Optimizer:** AdamW
|
64 |
+
- **Batch Size:** 2 (per-GPU batch) \* 2 (GPUs) \* 128 (gradient accumulation steps) = 512
|
65 |
+
|
66 |
+
### Validation Performance
|
67 |
+
The model achieved the following scores in its final validation run.
|
68 |
+
|
69 |
+
- **MSE in CIELAB space:** 0.002154
|
70 |
+
- **MSE in RGB space:** 0.0062
|
71 |
+
- **LPIPS:** 0.0555
|
72 |
+
|
73 |
+
## Uses
|
74 |
+
|
75 |
+
LibreVAE is intended to be used by researchers or developers as a component for generative models, such as text-to-image models. The developers don't forsee any direct uses that would not be better served by an existing image compression solution.
|
76 |
+
|
77 |
+
## License
|
78 |
+
|
79 |
+
The weights for LibreVAE are released under the [CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/) license.
|
80 |
+
|
81 |
+
## Citation
|
82 |
+
|
83 |
+
Under the CC0 1.0 license, you are not required to provide any attribution when using or redistributing LibreVAE. If you use LibreVAE in your research or projects and would like to provide attribution, you can cite it as:
|
84 |
+
```
|
85 |
+
@misc{LibreVAE2024,
|
86 |
+
title={LibreVAE},
|
87 |
+
author={Scrumptious AI Labs},
|
88 |
+
year={2024},
|
89 |
+
note={https://huggingface.co/scrumptious/librevae-f8-d8}
|
90 |
+
}
|
91 |
+
```
|
92 |
+
## Acknowledgments
|
93 |
+
|
94 |
+
Special thanks to the contributors of the e621-2024 and danbooru2023 datasets, the HuggingFace Diffusers team, and the PyTorch Lightning team.
|
config.json
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_class_name": "AutoencoderKL",
|
3 |
+
"_diffusers_version": "0.30.3",
|
4 |
+
"act_fn": "silu",
|
5 |
+
"block_out_channels": [
|
6 |
+
128,
|
7 |
+
256,
|
8 |
+
512,
|
9 |
+
512
|
10 |
+
],
|
11 |
+
"down_block_types": [
|
12 |
+
"DownEncoderBlock2D",
|
13 |
+
"DownEncoderBlock2D",
|
14 |
+
"DownEncoderBlock2D",
|
15 |
+
"DownEncoderBlock2D"
|
16 |
+
],
|
17 |
+
"force_upcast": false,
|
18 |
+
"in_channels": 3,
|
19 |
+
"latent_channels": 8,
|
20 |
+
"latents_mean": null,
|
21 |
+
"latents_std": null,
|
22 |
+
"layers_per_block": 2,
|
23 |
+
"mid_block_add_attention": true,
|
24 |
+
"norm_num_groups": 32,
|
25 |
+
"out_channels": 3,
|
26 |
+
"sample_size": 32,
|
27 |
+
"scaling_factor": 0.9615,
|
28 |
+
"shift_factor": null,
|
29 |
+
"up_block_types": [
|
30 |
+
"UpDecoderBlock2D",
|
31 |
+
"UpDecoderBlock2D",
|
32 |
+
"UpDecoderBlock2D",
|
33 |
+
"UpDecoderBlock2D"
|
34 |
+
],
|
35 |
+
"use_post_quant_conv": true,
|
36 |
+
"use_quant_conv": true
|
37 |
+
}
|
diffusion_pytorch_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d72453572b449c46d15a257872d65cecefb2fab05fdcd7d33fdcdb37f3ef8969
|
3 |
+
size 334865516
|