File size: 1,193 Bytes
8ef0add
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: apache-2.0
pipeline_tag: zero-shot-image-classification
library_name: openclip
---

# LongCLIP model

This repository contains the weights of the LongCLIP model.

Paper: https://huggingface.co/papers/2403.15378

Github repository: https://github.com/beichenzbc/long-clip

## Installation

```bash
git clone https://github.com/beichenzbc/Long-CLIP.git
cd Long-CLIP
```

## Usage

```
from model import longclip
import torch
from PIL import Image
from huggingface_hub import hf_hub_download

device = "cuda" if torch.cuda.is_available() else "cpu"
filepath = hf_hub_download(repo_id="BeichenZhang/LongCLIP-L", filename="longclip-L.pt")
model, preprocess = longclip.load(filepath, device=device)

text = longclip.tokenize(["A man is crossing the street with a red car parked nearby.", "A man is driving a car in an urban scene."]).to(device)
image = preprocess(Image.open("./img/demo.png")).unsqueeze(0).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    
    logits_per_image = image_features @ text_features.T
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)
```