Models of experiment: https://github.com/deepghs/tagger_embedding_aligner
import numpy as np
from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction, denormalize_wd14_emb
embedding, (r, g, c) = get_wd14_tags(
'/my/image.png',
fmt=('embedding', ('rating', 'general', 'character')),
)
# normal tag results
print('Expected result:')
print(r)
print(g)
print(c)
# normalize embedding
embedding = embedding / np.linalg.norm(embedding)
# bad tag results
br, bg, bc = convert_wd14_emb_to_prediction(embedding)
print('Bad results due to the embedding normalization:')
print(br)
print(bg)
print(bc)
# denormalize this embedding
output = denormalize_wd14_emb(embedding)
print(output.shape)
# should be similar to r, g, c, approx 1e-3 error
rating, general, character = convert_wd14_emb_to_prediction(output)
print('De-normalized result:')
print(rating)
print(general)
print(character)
Name | Tagger | Embedding Width | Tags Count | FLOPS | Params | EMB Cosine | EMB Norm | Pred Loss | Pred MSE |
---|---|---|---|---|---|---|---|---|---|
ViT_v3_mnum2_all | ViT_v3 | 768 | 10861 | 0.000398G | 0.40M | 1 | 0.1712 | 0.004306 | 2.116e-08 |
ViT_v3_mnum1_all | ViT_v3 | 768 | 10861 | 0.000709G | 0.71M | 1 | 0.2246 | 0.004306 | 3.991e-08 |
ConvNext_v3_mnum2_all | ConvNext_v3 | 1024 | 10861 | 0.000708G | 0.71M | 1 | 0.1126 | 0.004531 | 2.061e-08 |
ConvNext_v3_mnum1_all | ConvNext_v3 | 1024 | 10861 | 0.001260G | 1.26M | 1 | 0.1473 | 0.004531 | 3.539e-08 |
ViT_mnum2_all | ViT | 768 | 9083 | 0.000398G | 0.40M | 1 | 0.08641 | 0.005199 | 3.797e-09 |
ViT_mnum1_all | ViT | 768 | 9083 | 0.000709G | 0.71M | 1 | 0.1724 | 0.005199 | 1.896e-08 |
ConvNext_mnum2_all | ConvNext | 1024 | 9083 | 0.000708G | 0.71M | 1 | 0.05776 | 0.005213 | 7.207e-09 |
ConvNext_mnum1_all | ConvNext | 1024 | 9083 | 0.001260G | 1.26M | 1 | 0.07134 | 0.005214 | 1.292e-08 |
ViT_Large_mnum2_all | ViT_Large | 1024 | 10861 | 0.000708G | 0.71M | 1 | 1.403 | 0.003966 | 1.617e-07 |
ViT_Large_mnum1_all | ViT_Large | 1024 | 10861 | 0.001260G | 1.26M | 1 | 1.643 | 0.003966 | 2.24e-07 |
SwinV2_mnum2_all | SwinV2 | 1024 | 9083 | 0.000708G | 0.71M | 1 | 0.1257 | 0.004726 | 3.797e-08 |
SwinV2_mnum1_all | SwinV2 | 1024 | 9083 | 0.001260G | 1.26M | 1 | 0.1497 | 0.004727 | 5.487e-08 |
EVA02_Large_mnum2_all | EVA02_Large | 1024 | 10861 | 0.000708G | 0.71M | 1 | 1.268 | 0.005948 | 5.466e-08 |
EVA02_Large_mnum1_all | EVA02_Large | 1024 | 10861 | 0.001260G | 1.26M | 1 | 1.713 | 0.005948 | 9.518e-08 |
ConvNextV2_mnum2_all | ConvNextV2 | 1024 | 9083 | 0.000708G | 0.71M | 1 | 0.09014 | 0.004596 | 1.43e-08 |
ConvNextV2_mnum1_all | ConvNextV2 | 1024 | 9083 | 0.001260G | 1.26M | 1 | 0.1216 | 0.004596 | 2.76e-08 |
SwinV2_v3_mnum2_all | SwinV2_v3 | 1024 | 10861 | 0.000708G | 0.71M | 1 | 0.2129 | 0.004128 | 4.035e-08 |
SwinV2_v3_mnum1_all | SwinV2_v3 | 1024 | 10861 | 0.001260G | 1.26M | 1 | 0.2784 | 0.004129 | 6.893e-08 |
MOAT_mnum2_all | MOAT | 1024 | 9083 | 0.000708G | 0.71M | 1 | 0.4662 | 0.004998 | 1.855e-08 |
MOAT_mnum1_all | MOAT | 1024 | 9083 | 0.001260G | 1.26M | 1 | 0.7849 | 0.004998 | 5.549e-08 |
Inference API (serverless) does not yet support dghs-imgutils models for this pipeline type.