noamrot commited on
Commit
6540118
·
1 Parent(s): 1a4b1f8

add python example

Browse files
Files changed (1) hide show
  1. README.md +30 -8
README.md CHANGED
@@ -14,19 +14,41 @@ A framework designed to generate semantically rich image captions.
14
 
15
  - 🚀 **Demo**: Try out our BLIP-based model [demo](https://huggingface.co/spaces/noamrot/FuseCap) trained using FuseCap, hosted on Huggingface Spaces.
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## Upcoming Updates
18
 
19
- The official codebase and trained models for this project will be released soon.
20
 
21
  ## BibTeX
22
 
23
  ``` Citation
24
- @misc{rotstein2023fusecap,
25
- title={FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions},
26
- author={Noam Rotstein and David Bensaid and Shaked Brody and Roy Ganz and Ron Kimmel},
27
- year={2023},
28
- eprint={2305.17718},
29
- archivePrefix={arXiv},
30
- primaryClass={cs.CV}
31
  }
32
  ```
 
14
 
15
  - 🚀 **Demo**: Try out our BLIP-based model [demo](https://huggingface.co/spaces/noamrot/FuseCap) trained using FuseCap, hosted on Huggingface Spaces.
16
 
17
+ #### Running the model
18
+
19
+ Our BLIP-based model can be run using the following code,
20
+
21
+ ```python
22
+ import requests
23
+ from PIL import Image
24
+ from transformers import BlipProcessor, BlipForConditionalGeneration
25
+ import torch
26
+
27
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
28
+ processor = BlipProcessor.from_pretrained("noamrot/FuseCap")
29
+ model = BlipForConditionalGeneration.from_pretrained("noamrot/FuseCap").to(device)
30
+
31
+ img_url = 'https://huggingface.co/spaces/noamrot/FuseCap/resolve/main/bike.jpg'
32
+ raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
33
+
34
+ text = "a picture of "
35
+ inputs = processor(raw_image, text, return_tensors="pt").to(device)
36
+
37
+ out = model.generate(**inputs, num_beams = 3)
38
+ print(processor.decode(out[0], skip_special_tokens=True))
39
+ ```
40
+
41
  ## Upcoming Updates
42
 
43
+ The official codebase, datasets and trained models for this project will be released soon.
44
 
45
  ## BibTeX
46
 
47
  ``` Citation
48
+ @article{rotstein2023fusecap,
49
+ title={FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions},
50
+ author={Rotstein, Noam and Bensaid, David and Brody, Shaked and Ganz, Roy and Kimmel, Ron},
51
+ journal={arXiv preprint arXiv:2305.17718},
52
+ year={2023}
 
 
53
  }
54
  ```