ColPali
Safetensors
English
vidore
vidore-experimental
manu commited on
Commit
6629d45
1 Parent(s): 1839908

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -13,9 +13,14 @@ ColPali is a model based on a novel model architecture and training strategy bas
13
  It is a [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
14
  It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
15
 
16
- This version has right padding to fix unwanted tokens in the query encoding.
 
 
 
 
 
17
  It also stems from the fixed `vidore/colpaligemma-3b-pt-448-base` to guarantee deterministic projection layer initialization.
18
- It was trained for 5 epochs, with in-batch negatives and hard mined negatives and a warmup of 1000 steps to help reduce non-english language collapse.
19
 
20
  Data is the same as the ColPali data described in the paper.
21
 
@@ -45,6 +50,11 @@ We train on an 8 GPU setup with data parallelism, a learning rate of 5e-5 with l
45
 
46
  ## Usage
47
 
 
 
 
 
 
48
  ```python
49
  import torch
50
  import typer
 
13
  It is a [PaliGemma-3B](https://huggingface.co/google/paligemma-3b-mix-448) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
14
  It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
15
 
16
+
17
+ ## Version specificity
18
+
19
+ This version is trained with `colpali-engine==0.2.0`.
20
+
21
+ Compared to `colpali`, this version is trained with right padding for queries to fix unwanted tokens in the query encoding.
22
  It also stems from the fixed `vidore/colpaligemma-3b-pt-448-base` to guarantee deterministic projection layer initialization.
23
+ It was trained for 5 epochs, with in-batch negatives and hard mined negatives and a warmup of 1000 steps (10x longer) to help reduce non-english language collapse.
24
 
25
  Data is the same as the ColPali data described in the paper.
26
 
 
50
 
51
  ## Usage
52
 
53
+ ```bash
54
+ pip install colpali-engine==0.2.0
55
+ ```
56
+
57
+
58
  ```python
59
  import torch
60
  import typer