antoinelouis
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -7,15 +7,16 @@ datasets:
|
|
7 |
metrics:
|
8 |
- recall
|
9 |
tags:
|
10 |
-
- feature-extraction
|
11 |
- sentence-similarity
|
12 |
-
|
|
|
|
|
13 |
inference: false
|
14 |
---
|
15 |
|
16 |
-
# colbertv1-camembert-base-mmarcoFR
|
17 |
|
18 |
-
This is a [ColBERTv1](https://
|
19 |
|
20 |
## Usage
|
21 |
|
@@ -77,6 +78,8 @@ RAG = RAGPretrainedModel.from_index(index_name) # if not already loaded
|
|
77 |
RAG.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
|
78 |
```
|
79 |
|
|
|
|
|
80 |
## Evaluation
|
81 |
|
82 |
The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its performance to a single-vector representation model fine-tuned on the same dataset. We report the mean reciprocal rank (MRR) and recall at various cut-offs (R@k).
|
@@ -86,6 +89,8 @@ The model is evaluated on the smaller development set of mMARCO-fr, which consis
|
|
86 |
| **colbertv1-camembert-base-mmarcoFR** | 🇫🇷 | 110M | 443MB | 29.51 | 54.21 | 80.00 | 88.40 |
|
87 |
| [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR) | 🇫🇷 | 110M | 443MB | 28.53 | 51.46 | 77.82 | 89.13 |
|
88 |
|
|
|
|
|
89 |
## Training
|
90 |
|
91 |
#### Data
|
@@ -107,7 +112,7 @@ to 128, and the maximum sequence lengths for questions and passages length were
|
|
107 |
```bibtex
|
108 |
@online{louis2023,
|
109 |
author = 'Antoine Louis',
|
110 |
-
title = 'colbertv1-camembert-base-mmarcoFR: A ColBERTv1 Model
|
111 |
publisher = 'Hugging Face',
|
112 |
month = 'dec',
|
113 |
year = '2023',
|
|
|
7 |
metrics:
|
8 |
- recall
|
9 |
tags:
|
|
|
10 |
- sentence-similarity
|
11 |
+
- colbert
|
12 |
+
base_model: camembert-base
|
13 |
+
library_name: RAGatouille
|
14 |
inference: false
|
15 |
---
|
16 |
|
17 |
+
# 🇫🇷 colbertv1-camembert-base-mmarcoFR
|
18 |
|
19 |
+
This is a [ColBERTv1](https://doi.org/10.48550/arXiv.2004.12832) model for semantic search. It encodes queries & passages into matrices of token-level embeddings and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators. The model was trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset.
|
20 |
|
21 |
## Usage
|
22 |
|
|
|
78 |
RAG.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
|
79 |
```
|
80 |
|
81 |
+
***
|
82 |
+
|
83 |
## Evaluation
|
84 |
|
85 |
The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its performance to a single-vector representation model fine-tuned on the same dataset. We report the mean reciprocal rank (MRR) and recall at various cut-offs (R@k).
|
|
|
89 |
| **colbertv1-camembert-base-mmarcoFR** | 🇫🇷 | 110M | 443MB | 29.51 | 54.21 | 80.00 | 88.40 |
|
90 |
| [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR) | 🇫🇷 | 110M | 443MB | 28.53 | 51.46 | 77.82 | 89.13 |
|
91 |
|
92 |
+
***
|
93 |
+
|
94 |
## Training
|
95 |
|
96 |
#### Data
|
|
|
112 |
```bibtex
|
113 |
@online{louis2023,
|
114 |
author = 'Antoine Louis',
|
115 |
+
title = 'colbertv1-camembert-base-mmarcoFR: A ColBERTv1 Model for French',
|
116 |
publisher = 'Hugging Face',
|
117 |
month = 'dec',
|
118 |
year = '2023',
|