Models of experiment: https://github.com/deepghs/tagger_embedding_aligner

import numpy as np

from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction, denormalize_wd14_emb

embedding, (r, g, c) = get_wd14_tags(
    '/my/image.png',
    fmt=('embedding', ('rating', 'general', 'character')),
)
# normal tag results
print('Expected result:')
print(r)
print(g)
print(c)

# normalize embedding
embedding = embedding / np.linalg.norm(embedding)
# bad tag results
br, bg, bc = convert_wd14_emb_to_prediction(embedding)
print('Bad results due to the embedding normalization:')
print(br)
print(bg)
print(bc)

# denormalize this embedding
output = denormalize_wd14_emb(embedding)
print(output.shape)

# should be similar to r, g, c, approx 1e-3 error
rating, general, character = convert_wd14_emb_to_prediction(output)
print('De-normalized result:')
print(rating)
print(general)
print(character)
Name Tagger Embedding Width Tags Count FLOPS Params EMB Cosine EMB Norm Pred Loss Pred MSE
ViT_v3_mnum2_all ViT_v3 768 10861 0.000398G 0.40M 1 0.1712 0.004306 2.116e-08
ViT_v3_mnum1_all ViT_v3 768 10861 0.000709G 0.71M 1 0.2246 0.004306 3.991e-08
ConvNext_v3_mnum2_all ConvNext_v3 1024 10861 0.000708G 0.71M 1 0.1126 0.004531 2.061e-08
ConvNext_v3_mnum1_all ConvNext_v3 1024 10861 0.001260G 1.26M 1 0.1473 0.004531 3.539e-08
ViT_mnum2_all ViT 768 9083 0.000398G 0.40M 1 0.08641 0.005199 3.797e-09
ViT_mnum1_all ViT 768 9083 0.000709G 0.71M 1 0.1724 0.005199 1.896e-08
ConvNext_mnum2_all ConvNext 1024 9083 0.000708G 0.71M 1 0.05776 0.005213 7.207e-09
ConvNext_mnum1_all ConvNext 1024 9083 0.001260G 1.26M 1 0.07134 0.005214 1.292e-08
ViT_Large_mnum2_all ViT_Large 1024 10861 0.000708G 0.71M 1 1.403 0.003966 1.617e-07
ViT_Large_mnum1_all ViT_Large 1024 10861 0.001260G 1.26M 1 1.643 0.003966 2.24e-07
SwinV2_mnum2_all SwinV2 1024 9083 0.000708G 0.71M 1 0.1257 0.004726 3.797e-08
SwinV2_mnum1_all SwinV2 1024 9083 0.001260G 1.26M 1 0.1497 0.004727 5.487e-08
EVA02_Large_mnum2_all EVA02_Large 1024 10861 0.000708G 0.71M 1 1.268 0.005948 5.466e-08
EVA02_Large_mnum1_all EVA02_Large 1024 10861 0.001260G 1.26M 1 1.713 0.005948 9.518e-08
ConvNextV2_mnum2_all ConvNextV2 1024 9083 0.000708G 0.71M 1 0.09014 0.004596 1.43e-08
ConvNextV2_mnum1_all ConvNextV2 1024 9083 0.001260G 1.26M 1 0.1216 0.004596 2.76e-08
SwinV2_v3_mnum2_all SwinV2_v3 1024 10861 0.000708G 0.71M 1 0.2129 0.004128 4.035e-08
SwinV2_v3_mnum1_all SwinV2_v3 1024 10861 0.001260G 1.26M 1 0.2784 0.004129 6.893e-08
MOAT_mnum2_all MOAT 1024 9083 0.000708G 0.71M 1 0.4662 0.004998 1.855e-08
MOAT_mnum1_all MOAT 1024 9083 0.001260G 1.26M 1 0.7849 0.004998 5.549e-08
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Inference API (serverless) does not yet support dghs-imgutils models for this pipeline type.

Dataset used to train deepghs/wd14_tagger_embedding_denormalize