Visual Document Retrieval
ColPali
Safetensors
English
vidore-experimental
vidore
tonywu71 commited on
Commit
00a95b8
·
verified ·
1 Parent(s): 1281da3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- license: mit
3
  library_name: colpali
4
  base_model: vidore/colqwen2-base
5
  language:
@@ -14,11 +14,10 @@ pipeline_tag: visual-document-retrieval
14
 
15
  ### This is the base version trained with batch_size 256 instead of 32 for 5 epoch and with the updated pad token
16
 
17
- ColQwen is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
18
  It is a [Qwen2-VL-2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
19
  It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
20
 
21
- This version is the untrained base version to guarantee deterministic projection layer initialization.
22
  <p align="center"><img width=800 src="https://github.com/illuin-tech/colpali/blob/main/assets/colpali_architecture.webp?raw=true"/></p>
23
 
24
  ## Version specificity
 
1
  ---
2
+ license: apache-2.0
3
  library_name: colpali
4
  base_model: vidore/colqwen2-base
5
  language:
 
14
 
15
  ### This is the base version trained with batch_size 256 instead of 32 for 5 epoch and with the updated pad token
16
 
17
+ ColQwen2 is a model based on a novel model architecture and training strategy based on Vision Language Models (VLMs) to efficiently index documents from their visual features.
18
  It is a [Qwen2-VL-2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) extension that generates [ColBERT](https://arxiv.org/abs/2004.12832)- style multi-vector representations of text and images.
19
  It was introduced in the paper [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449) and first released in [this repository](https://github.com/ManuelFay/colpali)
20
 
 
21
  <p align="center"><img width=800 src="https://github.com/illuin-tech/colpali/blob/main/assets/colpali_architecture.webp?raw=true"/></p>
22
 
23
  ## Version specificity