antoinelouis
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -105,25 +105,21 @@ with Run().context(RunConfig(nranks=n_gpu,experiment=experiment)):
|
|
105 |
# results: tuple of tuples of length k containing ((passage_id, passage_rank, passage_score), ...)
|
106 |
```
|
107 |
|
108 |
-
***
|
109 |
-
|
110 |
## Evaluation
|
111 |
|
112 |
-
The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of
|
113 |
-
|
114 |
-
|
|
|
115 |
|
116 |
| model | #Param.(↓) | Size | Dim. | Index | R@1000 | R@500 | R@100 | R@10 | MRR@10 |
|
117 |
|:-----------------------------------------------------------------------------------------------------------|-----------:|------:|-----:|------:|-------:|------:|------:|-----:|-------:|
|
118 |
| **colbertv2-camembert-L4-mmarcoFR** | 54M | 0.2GB | 32 | 9GB | 91.9 | 90.3 | 81.9 | 56.7 | 32.3 |
|
119 |
| [FraColBERTv2](https://huggingface.co/bclavie/FraColBERTv2) | 111M | 0.4GB | 128 | 28GB | 90.0 | 88.9 | 81.2 | 57.1 | 32.4 |
|
120 |
| [colbertv1-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/colbertv1-camembert-base-mmarcoFR) | 111M | 0.4GB | 128 | 28GB | 89.7 | 88.4 | 80.0 | 54.2 | 29.5 |
|
121 |
-
| [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR) | 111M | 0.4GB | 128 | 28GB | - | 89.1 | 77.8 | 51.5 | 28.5 |
|
122 |
|
123 |
NB: Index corresponds to the size of the mMARCO-fr index (8.8M passages) on disk when using ColBERTv2's residual compression mechanism.
|
124 |
|
125 |
-
***
|
126 |
-
|
127 |
## Training
|
128 |
|
129 |
#### Data
|
@@ -144,17 +140,15 @@ H100 GPU for 325k steps using the AdamW optimizer with a batch size of 32, a pea
|
|
144 |
The embedding dimension is set to 32, and the maximum sequence lengths for questions and passages length were fixed to 32 and 160 tokens, respectively. We use
|
145 |
the cosine similarity to compute relevance scores.
|
146 |
|
147 |
-
***
|
148 |
-
|
149 |
## Citation
|
150 |
|
151 |
```bibtex
|
152 |
-
@online{
|
153 |
-
|
154 |
-
|
155 |
-
|
156 |
-
|
157 |
-
|
158 |
-
|
159 |
}
|
160 |
```
|
|
|
105 |
# results: tuple of tuples of length k containing ((passage_id, passage_rank, passage_score), ...)
|
106 |
```
|
107 |
|
|
|
|
|
108 |
## Evaluation
|
109 |
|
110 |
+
The model is evaluated on the smaller development set of [mMARCO-fr](https://ir-datasets.com/mmarco.html#mmarco/v2/fr/), which consists of 6,980 queries for a corpus of
|
111 |
+
8.8M candidate passages. We report the mean reciprocal rank (MRR), normalized discounted cumulative gainand (NDCG), mean average precision (MAP), and recall at various cut-offs (R@k).
|
112 |
+
Below, we compare its performance with other publicly available French ColBERT models fine-tuned on the same dataset. To see how it compares to other neural retrievers in French,
|
113 |
+
check out the [*DécouvrIR*](https://huggingface.co/spaces/antoinelouis/decouvrir) leaderboard.
|
114 |
|
115 |
| model | #Param.(↓) | Size | Dim. | Index | R@1000 | R@500 | R@100 | R@10 | MRR@10 |
|
116 |
|:-----------------------------------------------------------------------------------------------------------|-----------:|------:|-----:|------:|-------:|------:|------:|-----:|-------:|
|
117 |
| **colbertv2-camembert-L4-mmarcoFR** | 54M | 0.2GB | 32 | 9GB | 91.9 | 90.3 | 81.9 | 56.7 | 32.3 |
|
118 |
| [FraColBERTv2](https://huggingface.co/bclavie/FraColBERTv2) | 111M | 0.4GB | 128 | 28GB | 90.0 | 88.9 | 81.2 | 57.1 | 32.4 |
|
119 |
| [colbertv1-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/colbertv1-camembert-base-mmarcoFR) | 111M | 0.4GB | 128 | 28GB | 89.7 | 88.4 | 80.0 | 54.2 | 29.5 |
|
|
|
120 |
|
121 |
NB: Index corresponds to the size of the mMARCO-fr index (8.8M passages) on disk when using ColBERTv2's residual compression mechanism.
|
122 |
|
|
|
|
|
123 |
## Training
|
124 |
|
125 |
#### Data
|
|
|
140 |
The embedding dimension is set to 32, and the maximum sequence lengths for questions and passages length were fixed to 32 and 160 tokens, respectively. We use
|
141 |
the cosine similarity to compute relevance scores.
|
142 |
|
|
|
|
|
143 |
## Citation
|
144 |
|
145 |
```bibtex
|
146 |
+
@online{louis2024decouvrir,
|
147 |
+
author = 'Antoine Louis',
|
148 |
+
title = 'DécouvrIR: A Benchmark for Evaluating the Robustness of Information Retrieval Models in French',
|
149 |
+
publisher = 'Hugging Face',
|
150 |
+
month = 'mar',
|
151 |
+
year = '2024',
|
152 |
+
url = 'https://huggingface.co/spaces/antoinelouis/decouvrir',
|
153 |
}
|
154 |
```
|