ikala-ray commited on
Commit
a24f9d2
·
1 Parent(s): 8b215c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -29,5 +29,5 @@ outputs = vision_tower(**inputs)
29
  logits_per_image = outputs.pooler_output # this is the image-text similarity score
30
  ```
31
 
32
- There's still a slight different where hf's CLIPVision model uses a [CLS] embedding as pool embedding while SigLIP uses global attention pooler to get the final latent feature.
33
 
 
29
  logits_per_image = outputs.pooler_output # this is the image-text similarity score
30
  ```
31
 
32
+ There's still a slight difference where hf's CLIPVision model uses a [CLS] embedding as pool embedding while SigLIP uses global attention pooler to get the final latent feature.
33