antoinelouis commited on
Commit
009e5fb
1 Parent(s): a5da238

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -57
README.md CHANGED
@@ -6,14 +6,10 @@ datasets:
6
  - unicamp-dl/mmarco
7
  metrics:
8
  - recall
9
- - posicube/mean_reciprocal_ranktags:
10
-
11
  tags:
12
- - sentence-transformers
13
  - feature-extraction
14
  - sentence-similarity
15
- - transformers
16
-
17
  ---
18
 
19
  # biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR
@@ -42,8 +38,6 @@ embeddings = model.encode(sentences)
42
  print(embeddings)
43
  ```
44
 
45
-
46
-
47
  #### 🤗 Transformers
48
 
49
  Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
@@ -81,58 +75,22 @@ print("Sentence embeddings:")
81
  print(sentence_embeddings)
82
  ```
83
 
84
-
85
-
86
  ## Evaluation
87
  ***
88
 
 
89
 
90
-
91
- We evaluated our model on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages.
92
- | MRR@10 | NDCG@10 | MAP@10 | Recall@10 | Recall@100 | Recall@500 |
93
- |---------:|----------:|---------:|------------:|-------------:|-------------:|
94
- | 22.29 | 26.57 | 21.8 | 41.25 | 66.78 | 79.83 |
95
-
96
-
97
- Below, we compared its results with other biencoder models fine-tuned on the same dataset:
98
- | | model | MRR@10 | NDCG@10 | MAP@10 | Recall@10 | Recall@100 (↑) | Recall@500 |
99
- |---:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------:|----------:|---------:|------------:|-------------:|-------------:|
100
- | 0 | [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR) | 28.53 | 33.72 | 27.93 | 51.46 | 77.82 | 89.13 |
101
- | 1 | [biencoder-all-mpnet-base-v2-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-all-mpnet-base-v2-mmarcoFR) | 28.04 | 33.28 | 27.5 | 51.07 | 77.68 | 88.67 |
102
- | 2 | [biencoder-multi-qa-mpnet-base-cos-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-multi-qa-mpnet-base-cos-v1-mmarcoFR) | 27.6 | 32.92 | 27.09 | 50.97 | 77.41 | 87.79 |
103
- | 3 | [biencoder-sentence-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-sentence-camembert-base-mmarcoFR) | 27.63 | 32.7 | 27.01 | 50.1 | 76.85 | 88.73 |
104
- | 4 | [biencoder-distilcamembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distilcamembert-base-mmarcoFR) | 26.8 | 31.87 | 26.23 | 49.2 | 76.44 | 87.87 |
105
- | 5 | [biencoder-mpnet-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mpnet-base-mmarcoFR) | 27.2 | 32.22 | 26.63 | 49.41 | 75.71 | 86.88 |
106
- | 6 | [biencoder-multi-qa-distilbert-cos-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-multi-qa-distilbert-cos-v1-mmarcoFR) | 26.36 | 31.26 | 25.82 | 47.93 | 75.42 | 86.78 |
107
- | 7 | [biencoder-bert-base-uncased-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-base-uncased-mmarcoFR) | 26.3 | 31.14 | 25.74 | 47.67 | 74.57 | 86.33 |
108
- | 8 | [biencoder-msmarco-distilbert-cos-v5-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-msmarco-distilbert-cos-v5-mmarcoFR) | 25.75 | 30.63 | 25.24 | 47.22 | 73.96 | 85.64 |
109
- | 9 | [biencoder-all-distilroberta-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-all-distilroberta-v1-mmarcoFR) | 26.17 | 30.91 | 25.67 | 47.06 | 73.5 | 85.69 |
110
- | 10 | [biencoder-all-MiniLM-L6-v2-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-all-MiniLM-L6-v2-mmarcoFR) | 25.49 | 30.39 | 24.99 | 47.1 | 73.48 | 86.09 |
111
- | 11 | [biencoder-distilbert-base-uncased-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distilbert-base-uncased-mmarcoFR) | 25.18 | 29.83 | 24.64 | 45.77 | 73.16 | 85.13 |
112
- | 12 | [biencoder-msmarco-MiniLM-L12-cos-v5-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-msmarco-MiniLM-L12-cos-v5-mmarcoFR) | 26.22 | 30.99 | 25.69 | 47.29 | 73.09 | 84.95 |
113
- | 13 | [biencoder-roberta-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-roberta-base-mmarcoFR) | 25.94 | 30.72 | 25.43 | 46.98 | 73.07 | 84.76 |
114
- | 14 | [biencoder-distiluse-base-multilingual-cased-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distiluse-base-multilingual-cased-v1-mmarcoFR) | 24.57 | 29.08 | 24.04 | 44.51 | 72.54 | 85.13 |
115
- | 15 | [biencoder-multi-qa-MiniLM-L6-cos-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-multi-qa-MiniLM-L6-cos-v1-mmarcoFR) | 24.72 | 29.58 | 24.25 | 46.05 | 72.19 | 84.6 |
116
- | 16 | [biencoder-MiniLM-L12-H384-uncased-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-MiniLM-L12-H384-uncased-mmarcoFR) | 25.43 | 30.1 | 24.88 | 46.13 | 72.16 | 83.84 |
117
- | 17 | [biencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR) | 24.74 | 29.41 | 24.23 | 45.4 | 71.52 | 84.42 |
118
- | 18 | [biencoder-electra-base-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-electra-base-discriminator-mmarcoFR) | 24.77 | 29.37 | 24.21 | 45.2 | 70.84 | 83.25 |
119
- | 19 | [biencoder-bert-medium-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-medium-mmarcoFR) | 23.86 | 28.56 | 23.39 | 44.47 | 70.57 | 83.58 |
120
- | 20 | [biencoder-msmarco-MiniLM-L6-cos-v5-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-msmarco-MiniLM-L6-cos-v5-mmarcoFR) | 24.39 | 28.96 | 23.91 | 44.58 | 70.36 | 82.88 |
121
- | 21 | [biencoder-distilroberta-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distilroberta-base-mmarcoFR) | 23.94 | 28.44 | 23.46 | 43.77 | 70.08 | 82.86 |
122
- | 22 | [biencoder-camemberta-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camemberta-base-mmarcoFR) | 24.78 | 29.24 | 24.23 | 44.58 | 69.59 | 82.18 |
123
- | 23 | [biencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR) | 23.38 | 27.97 | 22.91 | 43.5 | 68.96 | 81.61 |
124
- | 24 | [biencoder-bert-small-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-small-mmarcoFR) | 22.4 | 26.84 | 21.95 | 41.96 | 68.88 | 82.14 |
125
- | 25 | [biencoder-mMiniLM-L6-v2-mmarcoFR-v2-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLM-L6-v2-mmarcoFR-v2-mmarcoFR) | 22.87 | 27.26 | 22.37 | 42.3 | 68.78 | 81.39 |
126
- | 26 | [biencoder-MiniLM-L6-H384-uncased-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-MiniLM-L6-H384-uncased-mmarcoFR) | 22.86 | 27.34 | 22.41 | 42.62 | 68.4 | 81.54 |
127
- | 27 | [biencoder-deberta-v3-small-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-deberta-v3-small-mmarcoFR) | 22.44 | 26.84 | 21.97 | 41.84 | 68.17 | 80.9 |
128
- | 28 | **biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR** | 22.29 | 26.57 | 21.8 | 41.25 | 66.78 | 79.83 |
129
- | 29 | [biencoder-bert-mini-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-mini-mmarcoFR) | 20.06 | 24.09 | 19.66 | 37.78 | 64.27 | 77.39 |
130
- | 30 | [biencoder-electra-small-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-electra-small-discriminator-mmarcoFR) | 20.32 | 24.36 | 19.9 | 38.16 | 63.98 | 77.23 |
131
- | 31 | [biencoder-deberta-v3-xsmall-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-deberta-v3-xsmall-mmarcoFR) | 17.7 | 21.29 | 17.31 | 33.59 | 58.76 | 73.45 |
132
- | 32 | [biencoder-bert-tiny-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-tiny-mmarcoFR) | 14.94 | 18.22 | 14.59 | 29.46 | 51.94 | 66.3 |
133
- | 33 | [biencoder-t5-small-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-t5-small-mmarcoFR) | 12.44 | 15.1 | 12.14 | 24.28 | 47.82 | 63.37 |
134
-
135
-
136
 
137
  ## Training
138
  ***
@@ -154,8 +112,6 @@ We used the French version of the [mMARCO](https://huggingface.co/datasets/unica
154
  - a smaller dev set of 6,980 queries (which is actually used for evaluation in most published works).
155
  Link: [https://ir-datasets.com/mmarco.html#mmarco/v2/fr/](https://ir-datasets.com/mmarco.html#mmarco/v2/fr/)
156
 
157
-
158
-
159
  ## Citation
160
 
161
  ```bibtex
 
6
  - unicamp-dl/mmarco
7
  metrics:
8
  - recall
 
 
9
  tags:
 
10
  - feature-extraction
11
  - sentence-similarity
12
+ library_name: sentence-transformers
 
13
  ---
14
 
15
  # biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR
 
38
  print(embeddings)
39
  ```
40
 
 
 
41
  #### 🤗 Transformers
42
 
43
  Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
 
75
  print(sentence_embeddings)
76
  ```
77
 
 
 
78
  ## Evaluation
79
  ***
80
 
81
+ We evaluated our model on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared the model performance with other biencoder models fine-tuned on the same dataset. We report the mean reciprocal rank (MRR), normalized discounted cumulative gainand (NDCG), mean average precision (MAP), and recall at various cut-offs (R@k).
82
 
83
+ | | model | Size | MRR@10 | NDCG@10 | MAP@10 | R@10 | R@100(↑) | R@500 |
84
+ |---:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|---------:|----------:|---------:|-------:|-----------:|--------:|
85
+ | 1 | [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR) | 443MB | 28.53 | 33.72 | 27.93 | 51.46 | 77.82 | 89.13 |
86
+ | 2 | [biencoder-all-mpnet-base-v2-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-all-mpnet-base-v2-mmarcoFR) | 438MB | 28.04 | 33.28 | 27.5 | 51.07 | 77.68 | 88.67 |
87
+ | 3 | [biencoder-sentence-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-sentence-camembert-base-mmarcoFR) | 443MB | 27.63 | 32.7 | 27.01 | 50.10 | 76.85 | 88.73 |
88
+ | 4 | [biencoder-distilcamembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distilcamembert-base-mmarcoFR) | 272MB | 26.80 | 31.87 | 26.23 | 49.20 | 76.44 | 87.87 |
89
+ | 5 | [biencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR) | 471MB | 24.74 | 29.41 | 24.23 | 45.40 | 71.52 | 84.42 |
90
+ | 6 | [biencoder-camemberta-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camemberta-base-mmarcoFR) | 447MB | 24.78 | 29.24 | 24.23 | 44.58 | 69.59 | 82.18 |
91
+ | 7 | [biencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR) | 440MB | 23.38 | 27.97 | 22.91 | 43.50 | 68.96 | 81.61 |
92
+ | 8 | [biencoder-mMiniLM-L6-v2-mmarco-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLM-L6-v2-mmarco-mmarcoFR) | 428MB | 22.87 | 27.26 | 22.37 | 42.3 | 68.78 | 81.39 |
93
+ | 9 | **biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR** | 428MB | 22.29 | 26.57 | 21.8 | 41.25 | 66.78 | 79.83 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
94
 
95
  ## Training
96
  ***
 
112
  - a smaller dev set of 6,980 queries (which is actually used for evaluation in most published works).
113
  Link: [https://ir-datasets.com/mmarco.html#mmarco/v2/fr/](https://ir-datasets.com/mmarco.html#mmarco/v2/fr/)
114
 
 
 
115
  ## Citation
116
 
117
  ```bibtex