wolfgangblack
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,9 @@ Age-classifying Generative Entity Vision Transformer
|
|
7 |
|
8 |
A Vision Transformer finetuned to classify images of human faces into 'minor' or 'adult'.
|
9 |
|
10 |
-
This model is a finetuned version of https://huggingface.co/nateraw/vit-age-classifier which was finetuned on the fairface dataset.
|
|
|
|
|
11 |
|
12 |
## Datasets
|
13 |
These datasets were used in finetuning, with fairface finetuning the classifier we built on top of.
|
@@ -22,7 +24,7 @@ https://civitai.com/models/668458/synthetic-human-dataset
|
|
22 |
|
23 |
This dataset was fully generated by flux and contains 15k images of men, women, boys, and girls from the front, side, and slightly above. This dataset will be expanded with sd15 images and the model will be retrained.
|
24 |
|
25 |
-
To use the model
|
26 |
|
27 |
```
|
28 |
import requests
|
@@ -39,7 +41,7 @@ model_dir = 'civitai/age-vit'
|
|
39 |
|
40 |
# Init model, transforms
|
41 |
model = ViTForImageClassification.from_pretrained(model_dir)
|
42 |
-
transforms =
|
43 |
|
44 |
# Transform our image and pass it through the model
|
45 |
inputs = transforms(im, return_tensors='pt')
|
@@ -50,4 +52,7 @@ proba = output.logits.softmax(1)
|
|
50 |
|
51 |
# Predicted Classes
|
52 |
preds = proba.argmax(1)
|
|
|
|
|
|
|
53 |
```
|
|
|
7 |
|
8 |
A Vision Transformer finetuned to classify images of human faces into 'minor' or 'adult'.
|
9 |
|
10 |
+
This model is a finetuned version of https://huggingface.co/nateraw/vit-age-classifier which was finetuned on the fairface dataset. We then utilize a dataset of generated humans to get the model to recognize the composition and styles across anime, cartoons, digitial art, etc as they're created by diffusion models.
|
11 |
+
|
12 |
+
Users should note that fairface is trained on specifically the human face and maybe a small portion of their body, similar to a 'headshot' whereas the generated dataset may be headshot style or include more of the body. To allow for better recognition we did not extract the faces of the generated dataset during training, instead allowing the model to train on the full image.
|
13 |
|
14 |
## Datasets
|
15 |
These datasets were used in finetuning, with fairface finetuning the classifier we built on top of.
|
|
|
24 |
|
25 |
This dataset was fully generated by flux and contains 15k images of men, women, boys, and girls from the front, side, and slightly above. This dataset will be expanded with sd15 images and the model will be retrained.
|
26 |
|
27 |
+
## To use the model
|
28 |
|
29 |
```
|
30 |
import requests
|
|
|
41 |
|
42 |
# Init model, transforms
|
43 |
model = ViTForImageClassification.from_pretrained(model_dir)
|
44 |
+
transforms = ViTImageProcessor.from_pretrained(model_dir)
|
45 |
|
46 |
# Transform our image and pass it through the model
|
47 |
inputs = transforms(im, return_tensors='pt')
|
|
|
52 |
|
53 |
# Predicted Classes
|
54 |
preds = proba.argmax(1)
|
55 |
+
|
56 |
+
# Get label/string prediction
|
57 |
+
prediction = model.config.id2label[preds.item()]
|
58 |
```
|