antoinelouis
/

colbertv2-camembert-L4-mmarcoFR

@@ -105,25 +105,21 @@ with Run().context(RunConfig(nranks=n_gpu,experiment=experiment)):
     # results: tuple of tuples of length k containing ((passage_id, passage_rank, passage_score), ...)
 ```
-***
 ## Evaluation
-The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its
-performance with other publicly available French ColBERT models (as well as one single-vector representation model) fine-tuned on the same dataset. We report the recall
-at various cut-offs (R@k) and the mean reciprocal rank at cut-off 10 (MRR@10).
 | model                                                                                                      | #Param.(↓) |  Size | Dim. | Index | R@1000 | R@500 | R@100 | R@10 | MRR@10 |
 |:-----------------------------------------------------------------------------------------------------------|-----------:|------:|-----:|------:|-------:|------:|------:|-----:|-------:|
 | **colbertv2-camembert-L4-mmarcoFR**                                                                        |        54M | 0.2GB |   32 |   9GB |   91.9 |  90.3 |  81.9 | 56.7 |   32.3 |
 | [FraColBERTv2](https://huggingface.co/bclavie/FraColBERTv2)                                                |       111M | 0.4GB |  128 |  28GB |   90.0 |  88.9 |  81.2 | 57.1 |   32.4 |
 | [colbertv1-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/colbertv1-camembert-base-mmarcoFR) |       111M | 0.4GB |  128 |  28GB |   89.7 |  88.4 |  80.0 | 54.2 |   29.5 |
-| [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR) |       111M | 0.4GB |  128 |  28GB |      - |  89.1 |  77.8 | 51.5 |   28.5 |
 NB: Index corresponds to the size of the mMARCO-fr index (8.8M passages) on disk when using ColBERTv2's residual compression mechanism.
-***
 ## Training
 #### Data
@@ -144,17 +140,15 @@ H100 GPU for 325k steps using the AdamW optimizer with a batch size of 32, a pea
 The embedding dimension is set to 32, and the maximum sequence lengths for questions and passages length were fixed to 32 and 160 tokens, respectively. We use
 the cosine similarity to compute relevance scores.
-***
 ## Citation
 ```bibtex
-@online{louis2024,
-   author    = 'Antoine Louis',
-   title     = 'colbertv2-camembert-L4-mmarcoFR: A Lightweight ColBERTv2 Model for French',
-   publisher = 'Hugging Face',
-   month     = 'mar',
-   year      = '2024',
-   url       = 'https://huggingface.co/antoinelouis/colbertv2-camembert-L4-mmarcoFR',
 }
 ```

     # results: tuple of tuples of length k containing ((passage_id, passage_rank, passage_score), ...)
 ```
 ## Evaluation
+The model is evaluated on the smaller development set of [mMARCO-fr](https://ir-datasets.com/mmarco.html#mmarco/v2/fr/), which consists of 6,980 queries for a corpus of
+8.8M candidate passages. We report the mean reciprocal rank (MRR), normalized discounted cumulative gainand (NDCG), mean average precision (MAP), and recall at various cut-offs (R@k).
+Below, we compare its performance with other publicly available French ColBERT models fine-tuned on the same dataset. To see how it compares to other neural retrievers in French,
+check out the [*DécouvrIR*](https://huggingface.co/spaces/antoinelouis/decouvrir) leaderboard.
 | model                                                                                                      | #Param.(↓) |  Size | Dim. | Index | R@1000 | R@500 | R@100 | R@10 | MRR@10 |
 |:-----------------------------------------------------------------------------------------------------------|-----------:|------:|-----:|------:|-------:|------:|------:|-----:|-------:|
 | **colbertv2-camembert-L4-mmarcoFR**                                                                        |        54M | 0.2GB |   32 |   9GB |   91.9 |  90.3 |  81.9 | 56.7 |   32.3 |
 | [FraColBERTv2](https://huggingface.co/bclavie/FraColBERTv2)                                                |       111M | 0.4GB |  128 |  28GB |   90.0 |  88.9 |  81.2 | 57.1 |   32.4 |
 | [colbertv1-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/colbertv1-camembert-base-mmarcoFR) |       111M | 0.4GB |  128 |  28GB |   89.7 |  88.4 |  80.0 | 54.2 |   29.5 |
 NB: Index corresponds to the size of the mMARCO-fr index (8.8M passages) on disk when using ColBERTv2's residual compression mechanism.
 ## Training
 #### Data
 The embedding dimension is set to 32, and the maximum sequence lengths for questions and passages length were fixed to 32 and 160 tokens, respectively. We use
 the cosine similarity to compute relevance scores.
 ## Citation
 ```bibtex
+@online{louis2024decouvrir,
+	author    = 'Antoine Louis',
+	title     = 'DécouvrIR: A Benchmark for Evaluating the Robustness of Information Retrieval Models in French',
+	publisher = 'Hugging Face',
+	month     = 'mar',
+	year      = '2024',
+	url       = 'https://huggingface.co/spaces/antoinelouis/decouvrir',
 }
 ```