deepghs/wd14_tagger_embedding_denormalize

Models of experiment: https://github.com/deepghs/tagger_embedding_aligner

import numpy as np

from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction, denormalize_wd14_emb

embedding, (r, g, c) = get_wd14_tags(
    '/my/image.png',
    fmt=('embedding', ('rating', 'general', 'character')),
)
# normal tag results
print('Expected result:')
print(r)
print(g)
print(c)

# normalize embedding
embedding = embedding / np.linalg.norm(embedding)
# bad tag results
br, bg, bc = convert_wd14_emb_to_prediction(embedding)
print('Bad results due to the embedding normalization:')
print(br)
print(bg)
print(bc)

# denormalize this embedding
output = denormalize_wd14_emb(embedding)
print(output.shape)

# should be similar to r, g, c, approx 1e-3 error
rating, general, character = convert_wd14_emb_to_prediction(output)
print('De-normalized result:')
print(rating)
print(general)
print(character)

Name	Tagger	Embedding Width	Tags Count	FLOPS	Params	EMB Cosine	EMB Norm	Pred Loss	Pred MSE
ViT_v3_mnum2_all	ViT_v3	768	10861	0.000398G	0.40M	1	0.1712	0.004306	2.116e-08
ViT_v3_mnum1_all	ViT_v3	768	10861	0.000709G	0.71M	1	0.2246	0.004306	3.991e-08
ConvNext_v3_mnum2_all	ConvNext_v3	1024	10861	0.000708G	0.71M	1	0.1126	0.004531	2.061e-08
ConvNext_v3_mnum1_all	ConvNext_v3	1024	10861	0.001260G	1.26M	1	0.1473	0.004531	3.539e-08
ViT_mnum2_all	ViT	768	9083	0.000398G	0.40M	1	0.08641	0.005199	3.797e-09
ViT_mnum1_all	ViT	768	9083	0.000709G	0.71M	1	0.1724	0.005199	1.896e-08
ConvNext_mnum2_all	ConvNext	1024	9083	0.000708G	0.71M	1	0.05776	0.005213	7.207e-09
ConvNext_mnum1_all	ConvNext	1024	9083	0.001260G	1.26M	1	0.07134	0.005214	1.292e-08
ViT_Large_mnum2_all	ViT_Large	1024	10861	0.000708G	0.71M	1	1.403	0.003966	1.617e-07
ViT_Large_mnum1_all	ViT_Large	1024	10861	0.001260G	1.26M	1	1.643	0.003966	2.24e-07
SwinV2_mnum2_all	SwinV2	1024	9083	0.000708G	0.71M	1	0.1257	0.004726	3.797e-08
SwinV2_mnum1_all	SwinV2	1024	9083	0.001260G	1.26M	1	0.1497	0.004727	5.487e-08
EVA02_Large_mnum2_all	EVA02_Large	1024	10861	0.000708G	0.71M	1	1.268	0.005948	5.466e-08
EVA02_Large_mnum1_all	EVA02_Large	1024	10861	0.001260G	1.26M	1	1.713	0.005948	9.518e-08
ConvNextV2_mnum2_all	ConvNextV2	1024	9083	0.000708G	0.71M	1	0.09014	0.004596	1.43e-08
ConvNextV2_mnum1_all	ConvNextV2	1024	9083	0.001260G	1.26M	1	0.1216	0.004596	2.76e-08
SwinV2_v3_mnum2_all	SwinV2_v3	1024	10861	0.000708G	0.71M	1	0.2129	0.004128	4.035e-08
SwinV2_v3_mnum1_all	SwinV2_v3	1024	10861	0.001260G	1.26M	1	0.2784	0.004129	6.893e-08
MOAT_mnum2_all	MOAT	1024	9083	0.000708G	0.71M	1	0.4662	0.004998	1.855e-08
MOAT_mnum1_all	MOAT	1024	9083	0.001260G	1.26M	1	0.7849	0.004998	5.549e-08

deepghs
/

wd14_tagger_embedding_denormalize

Dataset used to train deepghs/wd14_tagger_embedding_denormalize