Update README.md
Browse files
README.md
CHANGED
@@ -9,34 +9,32 @@ tags:
|
|
9 |
pipeline_tag: image-to-text
|
10 |
---
|
11 |
|
12 |
-
# BLIP-2, OPT-6.7b,
|
13 |
|
14 |
-
This
|
15 |
-
It was introduced in the paper [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) by Li et al. and first released in [this repository](https://github.com/salesforce/LAVIS/tree/main/projects/blip2).
|
16 |
|
17 |
-
|
18 |
|
|
|
19 |
|
20 |
-
##
|
21 |
|
22 |
-
BLIP-2
|
23 |
|
24 |
-
|
25 |
-
while training the Querying Transformer, which is a BERT-like Transformer encoder that maps a set of "query tokens" to query embeddings,
|
26 |
-
which bridge the gap between the embedding space of the image encoder and the large language model.
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
31 |
-
alt="drawing" width="600"/>
|
32 |
|
33 |
-
|
34 |
|
35 |
-
-
|
36 |
-
- visual question answering (VQA)
|
37 |
-
- chat-like conversations by feeding the image and the previous conversation as prompt to the model
|
38 |
|
|
|
39 |
|
40 |
-
|
|
|
|
|
41 |
|
42 |
-
|
|
|
9 |
pipeline_tag: image-to-text
|
10 |
---
|
11 |
|
12 |
+
# BLIP-2, OPT-6.7b, Fine-tuned on COCO - Unofficial FP16 Version
|
13 |
|
14 |
+
This repository contains an unofficial version of the BLIP-2 model, leveraging [OPT-6.7b](https://huggingface.co/facebook/opt-6.7b), which has been fine-tuned on COCO and converted to FP16 for reduced model size and memory footprint.
|
|
|
15 |
|
16 |
+
The original model, BLIP-2, was introduced in the paper [BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models](https://arxiv.org/abs/2301.12597) by Li et al. and first released in [this repository](https://github.com/salesforce/LAVIS/tree/main/projects/blip2).
|
17 |
|
18 |
+
For a comprehensive understanding of the model, its description, intended uses, limitations, and instructions on usage with different hardware and precision settings, please refer to the [official model card](https://huggingface.co/Salesforce/blip2-opt-6.7b-coco).
|
19 |
|
20 |
+
## Unofficial FP16 Version
|
21 |
|
22 |
+
This version of the BLIP-2 model has been converted to use FP16 precision, which effectively reduces the model size and memory requirements. The conversion to FP16 can potentially accelerate the model's computation time on hardware with FP16 support, although it might slightly affect the model's performance due to reduced numerical precision.
|
23 |
|
24 |
+
This unofficial FP16 version is ideal for situations where storage, memory, or computational resources are limited.
|
|
|
|
|
25 |
|
26 |
+
Please note, this is an **unofficial** repository and not maintained or endorsed by the original authors of the model. The FP16 conversion was conducted independently and any potential issues, limitations or discrepancies with the original model are not the responsibility of the original authors.
|
27 |
|
28 |
+
### How to use
|
|
|
29 |
|
30 |
+
The usage of this FP16 version of the model is similar to the original model. For specific code examples, we refer to the [documentation](https://huggingface.co/docs/transformers/main/en/model_doc/blip-2#transformers.Blip2ForConditionalGeneration.forward.example).
|
31 |
|
32 |
+
Please ensure to test the performance and accuracy of this FP16 model thoroughly in your specific use-case to confirm it meets your needs.
|
|
|
|
|
33 |
|
34 |
+
This version can be used for tasks like:
|
35 |
|
36 |
+
- image captioning
|
37 |
+
- visual question answering (VQA)
|
38 |
+
- chat-like conversations by feeding the image and the previous conversation as a prompt to the model
|
39 |
|
40 |
+
*Disclaimer: This is an unofficial version of the model and any potential issues or discrepancies from the official model are not the responsibility of the original authors.*
|