antoinelouis
commited on
Commit
•
009e5fb
1
Parent(s):
a5da238
Update README.md
Browse files
README.md
CHANGED
@@ -6,14 +6,10 @@ datasets:
|
|
6 |
- unicamp-dl/mmarco
|
7 |
metrics:
|
8 |
- recall
|
9 |
-
- posicube/mean_reciprocal_ranktags:
|
10 |
-
|
11 |
tags:
|
12 |
-
- sentence-transformers
|
13 |
- feature-extraction
|
14 |
- sentence-similarity
|
15 |
-
-
|
16 |
-
|
17 |
---
|
18 |
|
19 |
# biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR
|
@@ -42,8 +38,6 @@ embeddings = model.encode(sentences)
|
|
42 |
print(embeddings)
|
43 |
```
|
44 |
|
45 |
-
|
46 |
-
|
47 |
#### 🤗 Transformers
|
48 |
|
49 |
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
|
@@ -81,58 +75,22 @@ print("Sentence embeddings:")
|
|
81 |
print(sentence_embeddings)
|
82 |
```
|
83 |
|
84 |
-
|
85 |
-
|
86 |
## Evaluation
|
87 |
***
|
88 |
|
|
|
89 |
|
90 |
-
|
91 |
-
|
92 |
-
|
|
93 |
-
|
94 |
-
|
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
|
99 |
-
|
100 |
-
|
|
101 |
-
| 1 | [biencoder-all-mpnet-base-v2-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-all-mpnet-base-v2-mmarcoFR) | 28.04 | 33.28 | 27.5 | 51.07 | 77.68 | 88.67 |
|
102 |
-
| 2 | [biencoder-multi-qa-mpnet-base-cos-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-multi-qa-mpnet-base-cos-v1-mmarcoFR) | 27.6 | 32.92 | 27.09 | 50.97 | 77.41 | 87.79 |
|
103 |
-
| 3 | [biencoder-sentence-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-sentence-camembert-base-mmarcoFR) | 27.63 | 32.7 | 27.01 | 50.1 | 76.85 | 88.73 |
|
104 |
-
| 4 | [biencoder-distilcamembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distilcamembert-base-mmarcoFR) | 26.8 | 31.87 | 26.23 | 49.2 | 76.44 | 87.87 |
|
105 |
-
| 5 | [biencoder-mpnet-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mpnet-base-mmarcoFR) | 27.2 | 32.22 | 26.63 | 49.41 | 75.71 | 86.88 |
|
106 |
-
| 6 | [biencoder-multi-qa-distilbert-cos-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-multi-qa-distilbert-cos-v1-mmarcoFR) | 26.36 | 31.26 | 25.82 | 47.93 | 75.42 | 86.78 |
|
107 |
-
| 7 | [biencoder-bert-base-uncased-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-base-uncased-mmarcoFR) | 26.3 | 31.14 | 25.74 | 47.67 | 74.57 | 86.33 |
|
108 |
-
| 8 | [biencoder-msmarco-distilbert-cos-v5-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-msmarco-distilbert-cos-v5-mmarcoFR) | 25.75 | 30.63 | 25.24 | 47.22 | 73.96 | 85.64 |
|
109 |
-
| 9 | [biencoder-all-distilroberta-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-all-distilroberta-v1-mmarcoFR) | 26.17 | 30.91 | 25.67 | 47.06 | 73.5 | 85.69 |
|
110 |
-
| 10 | [biencoder-all-MiniLM-L6-v2-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-all-MiniLM-L6-v2-mmarcoFR) | 25.49 | 30.39 | 24.99 | 47.1 | 73.48 | 86.09 |
|
111 |
-
| 11 | [biencoder-distilbert-base-uncased-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distilbert-base-uncased-mmarcoFR) | 25.18 | 29.83 | 24.64 | 45.77 | 73.16 | 85.13 |
|
112 |
-
| 12 | [biencoder-msmarco-MiniLM-L12-cos-v5-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-msmarco-MiniLM-L12-cos-v5-mmarcoFR) | 26.22 | 30.99 | 25.69 | 47.29 | 73.09 | 84.95 |
|
113 |
-
| 13 | [biencoder-roberta-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-roberta-base-mmarcoFR) | 25.94 | 30.72 | 25.43 | 46.98 | 73.07 | 84.76 |
|
114 |
-
| 14 | [biencoder-distiluse-base-multilingual-cased-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distiluse-base-multilingual-cased-v1-mmarcoFR) | 24.57 | 29.08 | 24.04 | 44.51 | 72.54 | 85.13 |
|
115 |
-
| 15 | [biencoder-multi-qa-MiniLM-L6-cos-v1-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-multi-qa-MiniLM-L6-cos-v1-mmarcoFR) | 24.72 | 29.58 | 24.25 | 46.05 | 72.19 | 84.6 |
|
116 |
-
| 16 | [biencoder-MiniLM-L12-H384-uncased-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-MiniLM-L12-H384-uncased-mmarcoFR) | 25.43 | 30.1 | 24.88 | 46.13 | 72.16 | 83.84 |
|
117 |
-
| 17 | [biencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR) | 24.74 | 29.41 | 24.23 | 45.4 | 71.52 | 84.42 |
|
118 |
-
| 18 | [biencoder-electra-base-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-electra-base-discriminator-mmarcoFR) | 24.77 | 29.37 | 24.21 | 45.2 | 70.84 | 83.25 |
|
119 |
-
| 19 | [biencoder-bert-medium-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-medium-mmarcoFR) | 23.86 | 28.56 | 23.39 | 44.47 | 70.57 | 83.58 |
|
120 |
-
| 20 | [biencoder-msmarco-MiniLM-L6-cos-v5-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-msmarco-MiniLM-L6-cos-v5-mmarcoFR) | 24.39 | 28.96 | 23.91 | 44.58 | 70.36 | 82.88 |
|
121 |
-
| 21 | [biencoder-distilroberta-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distilroberta-base-mmarcoFR) | 23.94 | 28.44 | 23.46 | 43.77 | 70.08 | 82.86 |
|
122 |
-
| 22 | [biencoder-camemberta-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camemberta-base-mmarcoFR) | 24.78 | 29.24 | 24.23 | 44.58 | 69.59 | 82.18 |
|
123 |
-
| 23 | [biencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR) | 23.38 | 27.97 | 22.91 | 43.5 | 68.96 | 81.61 |
|
124 |
-
| 24 | [biencoder-bert-small-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-small-mmarcoFR) | 22.4 | 26.84 | 21.95 | 41.96 | 68.88 | 82.14 |
|
125 |
-
| 25 | [biencoder-mMiniLM-L6-v2-mmarcoFR-v2-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLM-L6-v2-mmarcoFR-v2-mmarcoFR) | 22.87 | 27.26 | 22.37 | 42.3 | 68.78 | 81.39 |
|
126 |
-
| 26 | [biencoder-MiniLM-L6-H384-uncased-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-MiniLM-L6-H384-uncased-mmarcoFR) | 22.86 | 27.34 | 22.41 | 42.62 | 68.4 | 81.54 |
|
127 |
-
| 27 | [biencoder-deberta-v3-small-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-deberta-v3-small-mmarcoFR) | 22.44 | 26.84 | 21.97 | 41.84 | 68.17 | 80.9 |
|
128 |
-
| 28 | **biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR** | 22.29 | 26.57 | 21.8 | 41.25 | 66.78 | 79.83 |
|
129 |
-
| 29 | [biencoder-bert-mini-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-mini-mmarcoFR) | 20.06 | 24.09 | 19.66 | 37.78 | 64.27 | 77.39 |
|
130 |
-
| 30 | [biencoder-electra-small-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-electra-small-discriminator-mmarcoFR) | 20.32 | 24.36 | 19.9 | 38.16 | 63.98 | 77.23 |
|
131 |
-
| 31 | [biencoder-deberta-v3-xsmall-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-deberta-v3-xsmall-mmarcoFR) | 17.7 | 21.29 | 17.31 | 33.59 | 58.76 | 73.45 |
|
132 |
-
| 32 | [biencoder-bert-tiny-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-bert-tiny-mmarcoFR) | 14.94 | 18.22 | 14.59 | 29.46 | 51.94 | 66.3 |
|
133 |
-
| 33 | [biencoder-t5-small-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-t5-small-mmarcoFR) | 12.44 | 15.1 | 12.14 | 24.28 | 47.82 | 63.37 |
|
134 |
-
|
135 |
-
|
136 |
|
137 |
## Training
|
138 |
***
|
@@ -154,8 +112,6 @@ We used the French version of the [mMARCO](https://huggingface.co/datasets/unica
|
|
154 |
- a smaller dev set of 6,980 queries (which is actually used for evaluation in most published works).
|
155 |
Link: [https://ir-datasets.com/mmarco.html#mmarco/v2/fr/](https://ir-datasets.com/mmarco.html#mmarco/v2/fr/)
|
156 |
|
157 |
-
|
158 |
-
|
159 |
## Citation
|
160 |
|
161 |
```bibtex
|
|
|
6 |
- unicamp-dl/mmarco
|
7 |
metrics:
|
8 |
- recall
|
|
|
|
|
9 |
tags:
|
|
|
10 |
- feature-extraction
|
11 |
- sentence-similarity
|
12 |
+
library_name: sentence-transformers
|
|
|
13 |
---
|
14 |
|
15 |
# biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR
|
|
|
38 |
print(embeddings)
|
39 |
```
|
40 |
|
|
|
|
|
41 |
#### 🤗 Transformers
|
42 |
|
43 |
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
|
|
|
75 |
print(sentence_embeddings)
|
76 |
```
|
77 |
|
|
|
|
|
78 |
## Evaluation
|
79 |
***
|
80 |
|
81 |
+
We evaluated our model on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared the model performance with other biencoder models fine-tuned on the same dataset. We report the mean reciprocal rank (MRR), normalized discounted cumulative gainand (NDCG), mean average precision (MAP), and recall at various cut-offs (R@k).
|
82 |
|
83 |
+
| | model | Size | MRR@10 | NDCG@10 | MAP@10 | R@10 | R@100(↑) | R@500 |
|
84 |
+
|---:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------:|---------:|----------:|---------:|-------:|-----------:|--------:|
|
85 |
+
| 1 | [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR) | 443MB | 28.53 | 33.72 | 27.93 | 51.46 | 77.82 | 89.13 |
|
86 |
+
| 2 | [biencoder-all-mpnet-base-v2-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-all-mpnet-base-v2-mmarcoFR) | 438MB | 28.04 | 33.28 | 27.5 | 51.07 | 77.68 | 88.67 |
|
87 |
+
| 3 | [biencoder-sentence-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-sentence-camembert-base-mmarcoFR) | 443MB | 27.63 | 32.7 | 27.01 | 50.10 | 76.85 | 88.73 |
|
88 |
+
| 4 | [biencoder-distilcamembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-distilcamembert-base-mmarcoFR) | 272MB | 26.80 | 31.87 | 26.23 | 49.20 | 76.44 | 87.87 |
|
89 |
+
| 5 | [biencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR) | 471MB | 24.74 | 29.41 | 24.23 | 45.40 | 71.52 | 84.42 |
|
90 |
+
| 6 | [biencoder-camemberta-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camemberta-base-mmarcoFR) | 447MB | 24.78 | 29.24 | 24.23 | 44.58 | 69.59 | 82.18 |
|
91 |
+
| 7 | [biencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR) | 440MB | 23.38 | 27.97 | 22.91 | 43.50 | 68.96 | 81.61 |
|
92 |
+
| 8 | [biencoder-mMiniLM-L6-v2-mmarco-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-mMiniLM-L6-v2-mmarco-mmarcoFR) | 428MB | 22.87 | 27.26 | 22.37 | 42.3 | 68.78 | 81.39 |
|
93 |
+
| 9 | **biencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR** | 428MB | 22.29 | 26.57 | 21.8 | 41.25 | 66.78 | 79.83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
|
95 |
## Training
|
96 |
***
|
|
|
112 |
- a smaller dev set of 6,980 queries (which is actually used for evaluation in most published works).
|
113 |
Link: [https://ir-datasets.com/mmarco.html#mmarco/v2/fr/](https://ir-datasets.com/mmarco.html#mmarco/v2/fr/)
|
114 |
|
|
|
|
|
115 |
## Citation
|
116 |
|
117 |
```bibtex
|