dmedhi commited on
Commit
9c430a5
·
verified ·
1 Parent(s): f61583b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -6
README.md CHANGED
@@ -1,10 +1,38 @@
1
  ---
2
- pipeline_tag: text-generation
3
  tags:
4
- - model_hub_mixin
5
- - pytorch_model_hub_mixin
 
 
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
9
- - Library: [More Information Needed]
10
- - Docs: [More Information Needed]
 
1
  ---
2
+ pipeline_tag: image-text-to-text
3
  tags:
4
+ - florence2
5
+ - smollm
6
+ - custom_code
7
+ license: apache-2.0
8
  ---
9
+ ## FloSmolV
10
+
11
+ A vision model for **Image-text to Text** generation produced by combining [HuggingFaceTB/SmolLM-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-360M-Instruct) and [microsoft/Florence-2-base](https://huggingface.co/microsoft/Florence-2-base).
12
+
13
+ The **Florence2-base** models generate texts(captions) from input images significantly faster. This text content can be input for a large language model to
14
+ answer questions. **SmolLM-360M** is an excellent model by HuggingFace team to generate rapid text output for input queries. These models are combined together to produce a
15
+ Visual Question Answering model which can produce answers from Images.
16
+
17
+ ## Usage
18
+
19
+ ### Transformers
20
+
21
+ Make sure to install the necessary dependencies first.
22
+
23
+ ```bash
24
+ pip install -qU transformers accelerate einops bitsandbytes flash_attn timm
25
+ ```
26
+ ```python
27
+ # load a free image from pixabay
28
+ from PIL import Image
29
+ import requests
30
+ url = "https://cdn.pixabay.com/photo/2023/11/01/11/15/cable-car-8357178_640.jpg"
31
+ img = Image.open(requests.get(url, stream=True).raw)
32
+
33
+ # download model
34
+ from transformers import AutoModelForCausalLM
35
+ model = AutoModelForCausalLM.from_pretrained("dmedhi/flosmolv", trust_remote_code=True).cuda()
36
+ model(img, "what is the object in the image?")
37
+ ```
38