Zero-Shot Image Classification
Safetensors
clip
zer0int commited on
Commit
cfec7d6
1 Parent(s): 6e3d306

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -5
README.md CHANGED
@@ -13,12 +13,9 @@ pipeline_tag: zero-shot-image-classification
13
  - Required: Use with my [zer0int/ComfyUI-HunyuanVideo-Nyan](https://github.com/zer0int/ComfyUI-HunyuanVideo-Nyan) node (changes influence of LLM vs. CLIP; otherwise, difference is very little).
14
  - ☕ [Buy me a coffee](https://ko-fi.com/zer0int)
15
 
16
-
17
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/HeMdxok8uKVA87BJqHpS9.png)
18
 
19
- The original CLIP model has 77 tokens max input - but only ~20 tokens effective length. See the [original Long-CLIP paper](https://arxiv.org/abs/2403.15378) for details.
20
-
21
- HunyuanVideo demo:
22
 
23
  69 tokens, normal scene:
24
  - Lens: 16mm. Aperture: f/2.8. Color Grading: Blue-green monochrome. Lighting: Low-key with backlit silhouettes. Background: Gothic cathedral at night, stained glass windows breaking. Camera angle: Over the shoulder of a ninja, tracking her mid-air leap as she lands on a rooftop.
@@ -26,7 +23,6 @@ HunyuanVideo demo:
26
  52 tokens, OOD (Out-of-Distribution) scene: Superior handling for consistency and prompt-following despite OOD concept.
27
  - In this surreal nightmare documentary, a sizable spider with a human face is peacefully savoring her breakfast at a diner. The spider has a spider body, but a lady's face on the front, and regular human hands at the end of the spider legs.
28
 
29
-
30
  <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/J1_xaDybbnF9UCBGxuKAc.mp4"></video>
31
 
32
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/awPdlSxGFOrs_kanLbaW_.png)
 
13
  - Required: Use with my [zer0int/ComfyUI-HunyuanVideo-Nyan](https://github.com/zer0int/ComfyUI-HunyuanVideo-Nyan) node (changes influence of LLM vs. CLIP; otherwise, difference is very little).
14
  - ☕ [Buy me a coffee](https://ko-fi.com/zer0int)
15
 
 
16
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/HeMdxok8uKVA87BJqHpS9.png)
17
 
18
+ The original CLIP model has 77 tokens max input - but only ~20 tokens effective length. See the [original Long-CLIP paper](https://arxiv.org/abs/2403.15378) for details. HunyuanVideo demo:
 
 
19
 
20
  69 tokens, normal scene:
21
  - Lens: 16mm. Aperture: f/2.8. Color Grading: Blue-green monochrome. Lighting: Low-key with backlit silhouettes. Background: Gothic cathedral at night, stained glass windows breaking. Camera angle: Over the shoulder of a ninja, tracking her mid-air leap as she lands on a rooftop.
 
23
  52 tokens, OOD (Out-of-Distribution) scene: Superior handling for consistency and prompt-following despite OOD concept.
24
  - In this surreal nightmare documentary, a sizable spider with a human face is peacefully savoring her breakfast at a diner. The spider has a spider body, but a lady's face on the front, and regular human hands at the end of the spider legs.
25
 
 
26
  <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/J1_xaDybbnF9UCBGxuKAc.mp4"></video>
27
 
28
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6490359a877fc29cb1b09451/awPdlSxGFOrs_kanLbaW_.png)