ikala
/

ViT-B-16-SigLIP-i18n-256-hf

Zero-Shot Image Classification

Inference Endpoints

Model card Files Files and versions Community

ikala-ray commited on Oct 25, 2023

Commit

a24f9d2

·

1 Parent(s): 8b215c4

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -29,5 +29,5 @@ outputs = vision_tower(**inputs)
 logits_per_image = outputs.pooler_output  # this is the image-text similarity score
 ```
-There's still a slight different where hf's CLIPVision model uses a [CLS] embedding as pool embedding while SigLIP uses global attention pooler to get the final latent feature.

 logits_per_image = outputs.pooler_output  # this is the image-text similarity score
 ```
+There's still a slight difference where hf's CLIPVision model uses a [CLS] embedding as pool embedding while SigLIP uses global attention pooler to get the final latent feature.