Update README.md
Browse files
README.md
CHANGED
@@ -8,43 +8,206 @@ tags:
|
|
8 |
license: mit
|
9 |
---
|
10 |
|
11 |
-
#
|
12 |
|
13 |
-
## Model description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
-
Describe the model here (what it does, what it's used for, etc.)
|
16 |
|
17 |
## Intended uses & limitations
|
18 |
|
19 |
#### How to use
|
20 |
|
21 |
```python
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
```
|
24 |
|
|
|
|
|
25 |
#### Limitations and bias
|
26 |
|
27 |
-
|
|
|
28 |
|
29 |
## Training data
|
30 |
|
31 |
-
|
32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
37 |
|
38 |
## Eval results
|
39 |
|
40 |
-
## Generated Images
|
41 |
|
42 |
-
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
### BibTeX entry and citation info
|
45 |
|
46 |
```bibtex
|
47 |
-
@
|
48 |
-
|
|
|
49 |
}
|
50 |
```
|
|
|
8 |
license: mit
|
9 |
---
|
10 |
|
11 |
+
# CycleGAN for unpaired image-to-image translation.
|
12 |
|
13 |
+
## Model description
|
14 |
+
|
15 |
+
CycleGAN for unpaired image-to-image translation.
|
16 |
+
Given two image domains A and B, the following components are trained end2end to translate between such domains:
|
17 |
+
- A generator A to B, named G_AB conditioned on an image from A
|
18 |
+
- A generator B to A, named G_BA conditioned on an image from B
|
19 |
+
- A domain classifier D_A, associated with G_AB
|
20 |
+
- A domain classifier D_B, associated with G_BA
|
21 |
+
|
22 |
+
|
23 |
+
At inference time, G_AB or G_BA are relevant to translate images, respectively A to B or B to A.
|
24 |
+
In the general setting, this technique provides style transfer functionalities between the selected image domains A and B.
|
25 |
+
This allows to obtain a generated translation by G_AB, of an image from domain A that resembles the distribution of the images from domain B, and viceversa for the generator G_BA.
|
26 |
+
Under these framework, these aspects have been used to perform style transfer between NFT collections.
|
27 |
+
A collection is selected as domain A, another one as domain B and the CycleGAN provides forward and backward translation between A and B.
|
28 |
+
This has showed to allows high quality translation even in absence of paired sample-ground-truth data.
|
29 |
+
In particular, the model performs well with stationary backgrounds (no drastic texture changes in the appearance of backgrounds) as it is capable of recognizing the attributes of each of the elements of an NFT collections.
|
30 |
+
An attribute can be a variation in type of dressed fashion items such as sunglasses, earrings, clothes and also face or body attributes with respect to a common template model of the given NFT collection).
|
31 |
|
|
|
32 |
|
33 |
## Intended uses & limitations
|
34 |
|
35 |
#### How to use
|
36 |
|
37 |
```python
|
38 |
+
import torch
|
39 |
+
from PIL import Image
|
40 |
+
from huggan.pytorch.cyclegan.modeling_cyclegan import GeneratorResNet
|
41 |
+
from torchvision import transforms as T
|
42 |
+
from torchvision.transforms import Compose, Resize, ToTensor, Normalize
|
43 |
+
from torchvision.utils import make_grid
|
44 |
+
from huggingface_hub import hf_hub_download, file_download
|
45 |
+
from accelerate import Accelerator
|
46 |
+
import json
|
47 |
+
|
48 |
+
def load_lightweight_model(model_name):
|
49 |
+
file_path = file_download.hf_hub_download(
|
50 |
+
repo_id=model_name,
|
51 |
+
filename="config.json"
|
52 |
+
)
|
53 |
+
config = json.loads(open(file_path).read())
|
54 |
+
organization_name, name = model_name.split("/")
|
55 |
+
model = Trainer(**config, organization_name=organization_name, name=name)
|
56 |
+
model.load(use_cpu=True)
|
57 |
+
model.accelerator = Accelerator()
|
58 |
+
return model
|
59 |
+
def get_concat_h(im1, im2):
|
60 |
+
dst = Image.new('RGB', (im1.width + im2.width, im1.height))
|
61 |
+
dst.paste(im1, (0, 0))
|
62 |
+
dst.paste(im2, (im1.width, 0))
|
63 |
+
return dst
|
64 |
+
|
65 |
+
|
66 |
+
n_channels = 3
|
67 |
+
image_size = 256
|
68 |
+
input_shape = (image_size, image_size)
|
69 |
+
|
70 |
+
transform = Compose([
|
71 |
+
T.ToPILImage(),
|
72 |
+
T.Resize(input_shape),
|
73 |
+
ToTensor(),
|
74 |
+
Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
|
75 |
+
])
|
76 |
+
|
77 |
+
# load the translation model from source to target images: source will be generated by a separate Lightweight GAN, w
|
78 |
+
# while the target images are the result of the translation applied by the GeneratorResnet to the generated source images.
|
79 |
+
# Hence, given the source domain A and target domain B,
|
80 |
+
# B = Translator(GAN(A))
|
81 |
+
translator = GeneratorResNet.from_pretrained(f'huggingnft/{model_name}',
|
82 |
+
input_shape=(n_channels, image_size, image_size),
|
83 |
+
num_residual_blocks=9)
|
84 |
+
|
85 |
+
# sample noise that is used to generate source images by the
|
86 |
+
z = torch.randn(nrows, 100, 1, 1)
|
87 |
+
# load the GAN generator of source images that will be translated by the translation model
|
88 |
+
model = load_lightweight_model(f"huggingnft/{model_name.split('__2__')[0]}")
|
89 |
+
collectionA = model.generate_app(
|
90 |
+
num=timestamped_filename(),
|
91 |
+
nrow=nrows,
|
92 |
+
checkpoint=-1,
|
93 |
+
types="default"
|
94 |
+
)[1]
|
95 |
+
# resize to translator model input shape
|
96 |
+
resize = T.Resize((256, 256))
|
97 |
+
input = resize(collectionA)
|
98 |
+
|
99 |
+
# translate the resized collectionA to collectionB
|
100 |
+
collectionB = translator(input)
|
101 |
+
|
102 |
+
out_transform = T.ToPILImage()
|
103 |
+
results = []
|
104 |
+
for collA_image, collB_image in zip(input, collectionB):
|
105 |
+
results.append(
|
106 |
+
get_concat_h(out_transform(make_grid(collA_image, nrow=1, normalize=True)), out_transform(make_grid(collB_image, nrow=1, normalize=True)))
|
107 |
+
)
|
108 |
```
|
109 |
|
110 |
+
|
111 |
+
|
112 |
#### Limitations and bias
|
113 |
|
114 |
+
Translation between collections provides exceptional output images in the case of NFT collections that portray subjects in the same way.
|
115 |
+
If the backgrounds vary too much within either of the collections, performance degrades or many more training iterations re required to achieve acceptable results.
|
116 |
|
117 |
## Training data
|
118 |
|
119 |
+
|
120 |
+
The CycleGAN model is trained on an unpaired dataset of samples from two selected NFT collections: colle tionA and collectionB.
|
121 |
+
To this end, two collections are loaded by means of the function load_dataset in the huggingface library, as follows.
|
122 |
+
A list of all available collections is available at [huggingNFT](https://huggingface.co/huggingnft)
|
123 |
+
```python
|
124 |
+
from datasets import load_dataset
|
125 |
+
|
126 |
+
collectionA = load_dataset("huggingnft/COLLECTION_A")
|
127 |
+
collectionB = load_dataset("huggingnft/COLLECTION_B")
|
128 |
+
```
|
129 |
+
|
130 |
+
|
131 |
|
132 |
## Training procedure
|
133 |
+
#### Preprocessing
|
134 |
+
The following transformations are applied to each input sample of collectionA and collectionB.
|
135 |
+
The input size is fixed to RGB images of height, width = 256, 256
|
136 |
+
```python
|
137 |
+
n_channels = 3
|
138 |
+
image_size = 256
|
139 |
+
input_shape = (image_size, image_size)
|
140 |
+
|
141 |
+
transform = Compose([
|
142 |
+
T.ToPILImage(),
|
143 |
+
T.Resize(input_shape),
|
144 |
+
ToTensor(),
|
145 |
+
Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
|
146 |
+
])
|
147 |
+
```
|
148 |
|
149 |
+
#### Hardware
|
150 |
+
The configuration has been tested on single GPU setup on a RTX5000 and A5000, as well as multi-gpu single-rank distributed setups composed of 2 of the mentioned GPUs.
|
151 |
+
|
152 |
+
#### Hyperparameters
|
153 |
+
The following configuration has been kept fixed for all translation models:
|
154 |
+
- learning rate 0.0002
|
155 |
+
- number of epochs 200
|
156 |
+
- learning rate decay activation at epoch 80
|
157 |
+
- number of residual blocks of the cyclegan 9
|
158 |
+
- cycle loss weight 10.0
|
159 |
+
- identity loss weight 5.0
|
160 |
+
- optimizer ADAM with beta1 0.5 and beta2 0.999
|
161 |
+
- batch size 8
|
162 |
+
- NO mixed precision training
|
163 |
|
164 |
## Eval results
|
165 |
|
|
|
166 |
|
167 |
+
#### Training reports
|
168 |
|
169 |
+
[Cryptopunks to boreapeyachtclub](https://wandb.ai/chris1nexus/experiments--experiments_cyclegan_punk_to_apes_HQ--0/reports/CycleGAN-training-report--VmlldzoxODUxNzQz?accessToken=vueurpbhd2i8n347j880yakggs0sqdf7u0hpz3bpfsbrxcmk1jk4obg18f6wfk9w)
|
170 |
+
|
171 |
+
|
172 |
+
[Boreapeyachtclub to mutant-ape-yacht-club](https://wandb.ai/chris1nexus/experiments--my_paperspace_boredapeyachtclub__2__mutant-ape-yacht-club--11/reports/CycleGAN-training-report--VmlldzoxODUxNzg4?accessToken=jpyviwn7kdf5216ycrthwp6l8t3heb0lt8djt7dz12guu64qnpdh3ekecfcnoahu)
|
173 |
+
|
174 |
+
|
175 |
+
#### Generated Images
|
176 |
+
|
177 |
+
In the provided images, row0 and row2 represent real images from the respective collections.
|
178 |
+
Row1 is the translation of the immediate above images in row0 by means of the G_AB translation model.
|
179 |
+
Row3 is the translation of the immediate above images in row2 by means of the G_BA translation model.
|
180 |
+
|
181 |
+
Visualization over the training iterations for [boreapeyachtclub to mutant-ape-yacht-club](https://wandb.ai/chris1nexus/experiments--my_paperspace_boredapeyachtclub__2__mutant-ape-yacht-club--11/reports/Shared-panel-22-04-15-08-04-99--VmlldzoxODQ0MDI3?accessToken=45m3kxex5m3rpev3s6vmrv69k3u9p9uxcsp2k90wvbxwxzlqbqjqlnmgpl9265c0)
|
182 |
+
|
183 |
+
Visualization over the training iterations for [Cryptopunks to boreapeyachtclub](https://wandb.ai/chris1nexus/experiments--experiments_cyclegan_punk_to_apes_HQ--0/reports/Shared-panel-22-04-17-11-04-83--VmlldzoxODUxNjk5?accessToken=o25si6nflp2xst649vt6ayt56bnb95mxmngt1ieso091j2oazmqnwaf4h78vc2tu)
|
184 |
+
|
185 |
+
|
186 |
+
### References
|
187 |
+
```bibtex
|
188 |
+
@misc{https://doi.org/10.48550/arxiv.1703.10593,
|
189 |
+
doi = {10.48550/ARXIV.1703.10593},
|
190 |
+
|
191 |
+
url = {https://arxiv.org/abs/1703.10593},
|
192 |
+
|
193 |
+
author = {Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A.},
|
194 |
+
|
195 |
+
keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
|
196 |
+
|
197 |
+
title = {Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks},
|
198 |
+
|
199 |
+
publisher = {arXiv},
|
200 |
+
|
201 |
+
year = {2017},
|
202 |
+
|
203 |
+
copyright = {arXiv.org perpetual, non-exclusive license}
|
204 |
+
}
|
205 |
+
```
|
206 |
### BibTeX entry and citation info
|
207 |
|
208 |
```bibtex
|
209 |
+
@InProceedings{huggingnft,
|
210 |
+
author={Aleksey Korshuk, Christian Cancedda}
|
211 |
+
year=2022
|
212 |
}
|
213 |
```
|