antoinelouis commited on
Commit
04fb740
·
verified ·
1 Parent(s): 91dba35

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -17
README.md CHANGED
@@ -105,25 +105,21 @@ with Run().context(RunConfig(nranks=n_gpu,experiment=experiment)):
105
  # results: tuple of tuples of length k containing ((passage_id, passage_rank, passage_score), ...)
106
  ```
107
 
108
- ***
109
-
110
  ## Evaluation
111
 
112
- The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its
113
- performance with other publicly available French ColBERT models (as well as one single-vector representation model) fine-tuned on the same dataset. We report the recall
114
- at various cut-offs (R@k) and the mean reciprocal rank at cut-off 10 (MRR@10).
 
115
 
116
  | model | #Param.(↓) | Size | Dim. | Index | R@1000 | R@500 | R@100 | R@10 | MRR@10 |
117
  |:-----------------------------------------------------------------------------------------------------------|-----------:|------:|-----:|------:|-------:|------:|------:|-----:|-------:|
118
  | **colbertv2-camembert-L4-mmarcoFR** | 54M | 0.2GB | 32 | 9GB | 91.9 | 90.3 | 81.9 | 56.7 | 32.3 |
119
  | [FraColBERTv2](https://huggingface.co/bclavie/FraColBERTv2) | 111M | 0.4GB | 128 | 28GB | 90.0 | 88.9 | 81.2 | 57.1 | 32.4 |
120
  | [colbertv1-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/colbertv1-camembert-base-mmarcoFR) | 111M | 0.4GB | 128 | 28GB | 89.7 | 88.4 | 80.0 | 54.2 | 29.5 |
121
- | [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR) | 111M | 0.4GB | 128 | 28GB | - | 89.1 | 77.8 | 51.5 | 28.5 |
122
 
123
  NB: Index corresponds to the size of the mMARCO-fr index (8.8M passages) on disk when using ColBERTv2's residual compression mechanism.
124
 
125
- ***
126
-
127
  ## Training
128
 
129
  #### Data
@@ -144,17 +140,15 @@ H100 GPU for 325k steps using the AdamW optimizer with a batch size of 32, a pea
144
  The embedding dimension is set to 32, and the maximum sequence lengths for questions and passages length were fixed to 32 and 160 tokens, respectively. We use
145
  the cosine similarity to compute relevance scores.
146
 
147
- ***
148
-
149
  ## Citation
150
 
151
  ```bibtex
152
- @online{louis2024,
153
- author = 'Antoine Louis',
154
- title = 'colbertv2-camembert-L4-mmarcoFR: A Lightweight ColBERTv2 Model for French',
155
- publisher = 'Hugging Face',
156
- month = 'mar',
157
- year = '2024',
158
- url = 'https://huggingface.co/antoinelouis/colbertv2-camembert-L4-mmarcoFR',
159
  }
160
  ```
 
105
  # results: tuple of tuples of length k containing ((passage_id, passage_rank, passage_score), ...)
106
  ```
107
 
 
 
108
  ## Evaluation
109
 
110
+ The model is evaluated on the smaller development set of [mMARCO-fr](https://ir-datasets.com/mmarco.html#mmarco/v2/fr/), which consists of 6,980 queries for a corpus of
111
+ 8.8M candidate passages. We report the mean reciprocal rank (MRR), normalized discounted cumulative gainand (NDCG), mean average precision (MAP), and recall at various cut-offs (R@k).
112
+ Below, we compare its performance with other publicly available French ColBERT models fine-tuned on the same dataset. To see how it compares to other neural retrievers in French,
113
+ check out the [*DécouvrIR*](https://huggingface.co/spaces/antoinelouis/decouvrir) leaderboard.
114
 
115
  | model | #Param.(↓) | Size | Dim. | Index | R@1000 | R@500 | R@100 | R@10 | MRR@10 |
116
  |:-----------------------------------------------------------------------------------------------------------|-----------:|------:|-----:|------:|-------:|------:|------:|-----:|-------:|
117
  | **colbertv2-camembert-L4-mmarcoFR** | 54M | 0.2GB | 32 | 9GB | 91.9 | 90.3 | 81.9 | 56.7 | 32.3 |
118
  | [FraColBERTv2](https://huggingface.co/bclavie/FraColBERTv2) | 111M | 0.4GB | 128 | 28GB | 90.0 | 88.9 | 81.2 | 57.1 | 32.4 |
119
  | [colbertv1-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/colbertv1-camembert-base-mmarcoFR) | 111M | 0.4GB | 128 | 28GB | 89.7 | 88.4 | 80.0 | 54.2 | 29.5 |
 
120
 
121
  NB: Index corresponds to the size of the mMARCO-fr index (8.8M passages) on disk when using ColBERTv2's residual compression mechanism.
122
 
 
 
123
  ## Training
124
 
125
  #### Data
 
140
  The embedding dimension is set to 32, and the maximum sequence lengths for questions and passages length were fixed to 32 and 160 tokens, respectively. We use
141
  the cosine similarity to compute relevance scores.
142
 
 
 
143
  ## Citation
144
 
145
  ```bibtex
146
+ @online{louis2024decouvrir,
147
+ author = 'Antoine Louis',
148
+ title = 'DécouvrIR: A Benchmark for Evaluating the Robustness of Information Retrieval Models in French',
149
+ publisher = 'Hugging Face',
150
+ month = 'mar',
151
+ year = '2024',
152
+ url = 'https://huggingface.co/spaces/antoinelouis/decouvrir',
153
  }
154
  ```