Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ tags:
|
|
10 |
|
11 |
ColPali is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
|
12 |
It is a [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
|
13 |
-
It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models[
|
14 |
|
15 |
## Model Description
|
16 |
|
@@ -36,8 +36,12 @@ with `alpha=32` and `r=32` on the transformer layers from the language model,
|
|
36 |
as well as the final randomly initialized projection layer, and use a `paged_adamw_8bit` optimizer.
|
37 |
We train on an 8 GPU setup with data parallelism, a learning rate of 5e-5 with linear decay with 2.5% warmup steps, and a batch size of 32.
|
38 |
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
-
## Intended uses & limitations
|
41 |
- **Focus**: The model primarily focuses on PDF-type documents and high-ressources languages, potentially limiting its generalization to other document types or less represented languages.
|
42 |
- **Support**: The model relies on multi-vector retreiving derived from the ColBERT late interaction mechanism, which may require engineering efforts to adapt to widely used vector retrieval frameworks that lack native multi-vector support.
|
43 |
|
|
|
10 |
|
11 |
ColPali is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
|
12 |
It is a [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
|
13 |
+
It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models[add link]]() and first released in [this repository](https://github.com/ManuelFay/colpali)
|
14 |
|
15 |
## Model Description
|
16 |
|
|
|
36 |
as well as the final randomly initialized projection layer, and use a `paged_adamw_8bit` optimizer.
|
37 |
We train on an 8 GPU setup with data parallelism, a learning rate of 5e-5 with linear decay with 2.5% warmup steps, and a batch size of 32.
|
38 |
|
39 |
+
## Intended uses
|
40 |
+
|
41 |
+
#TODO
|
42 |
+
|
43 |
+
## Limitations
|
44 |
|
|
|
45 |
- **Focus**: The model primarily focuses on PDF-type documents and high-ressources languages, potentially limiting its generalization to other document types or less represented languages.
|
46 |
- **Support**: The model relies on multi-vector retreiving derived from the ColBERT late interaction mechanism, which may require engineering efforts to adapt to widely used vector retrieval frameworks that lack native multi-vector support.
|
47 |
|