vidore
/

colpali-v1.2

Visual Document Retrieval

vidore-experimental

Model card Files Files and versions Community

manu commited on Aug 29, 2024

Commit

6629d45

·

verified ·

1 Parent(s): 1839908

Update README.md

Files changed (1) hide show

README.md +12 -2

README.md CHANGED Viewed

@@ -13,9 +13,14 @@ ColPali is a model based on a novel model architecture and training strategy bas
 It is a [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
 It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
-This version has right padding to fix unwanted tokens in the query encoding.
 It also stems from the fixed `vidore/colpaligemma-3b-pt-448-base` to guarantee deterministic projection layer initialization.
-It was trained for 5 epochs, with in-batch negatives and hard mined negatives and a warmup of 1000 steps to help reduce non-english language collapse.
 Data is the same as the ColPali data described in the paper.
@@ -45,6 +50,11 @@ We train on an 8 GPU setup with data parallelism, a learning rate of 5e-5 with l
 ## Usage
 ```python
 import torch
 import typer

 It is a [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
 It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
+## Version specificity
+This version is trained with `colpali-engine==0.2.0`.
+Compared to `colpali`, this version is trained with right padding for queries to fix unwanted tokens in the query encoding.
 It also stems from the fixed `vidore/colpaligemma-3b-pt-448-base` to guarantee deterministic projection layer initialization.
+It was trained for 5 epochs, with in-batch negatives and hard mined negatives and a warmup of 1000 steps (10x longer) to help reduce non-english language collapse.
 Data is the same as the ColPali data described in the paper.
 ## Usage
+```bash
+pip install colpali-engine==0.2.0
+```
 ```python
 import torch
 import typer