antoinelouis
/

colbertv1-camembert-base-mmarcoFR

Sentence Similarity

passage-retrieval

Model card Files Files and versions Community

antoinelouis commited on Jan 10

Commit

4a94c57

•

1 Parent(s): 513c436

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -67,7 +67,7 @@ with Run().context(RunConfig(nranks=n_gpu,experiment=experiment)):
 ## Evaluation
-The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its performance with a single-vector representation model fine-tuned on the same dataset. We report the mean reciprocal rank (MRR) and recall at various cut-offs (R@k).
 | model                                                                                                                   | Vocab. | #Param. |  Size |   MRR@10 |   R@10 |   R@100(↑) |   R@500 |
 |:------------------------------------------------------------------------------------------------------------------------|:-------|--------:|------:|---------:|-------:|-----------:|--------:|
@@ -87,6 +87,7 @@ The model is fine-tuned on the French version of the [mMARCO](https://huggingfac
 - a training set of ~533k unique queries (with at least one relevant passage);
 - a development set of ~101k queries;
 - a smaller dev set of 6,980 queries (which is actually used for evaluation in most published works).
 The triples are sampled from the ~39.8M triples from [triples.train.small.tsv](https://microsoft.github.io/msmarco/Datasets.html#passage-ranking-dataset). In the future, better negatives could be selected by exploiting the [msmarco-hard-negatives] dataset that contains 50 hard negatives mined from BM25 and 12 dense retrievers for each training query.
 ## Citation

 ## Evaluation
+The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its performance to a single-vector representation model fine-tuned on the same dataset. We report the mean reciprocal rank (MRR) and recall at various cut-offs (R@k).
 | model                                                                                                                   | Vocab. | #Param. |  Size |   MRR@10 |   R@10 |   R@100(↑) |   R@500 |
 |:------------------------------------------------------------------------------------------------------------------------|:-------|--------:|------:|---------:|-------:|-----------:|--------:|
 - a training set of ~533k unique queries (with at least one relevant passage);
 - a development set of ~101k queries;
 - a smaller dev set of 6,980 queries (which is actually used for evaluation in most published works).
 The triples are sampled from the ~39.8M triples from [triples.train.small.tsv](https://microsoft.github.io/msmarco/Datasets.html#passage-ranking-dataset). In the future, better negatives could be selected by exploiting the [msmarco-hard-negatives] dataset that contains 50 hard negatives mined from BM25 and 12 dense retrievers for each training query.
 ## Citation