File size: 10,573 Bytes
927dcc8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
pipeline_tag: sentence-similarity
language: fr
license: apache-2.0
datasets:
- unicamp-dl/mmarco
metrics:
- recall
tags:
- sentence-similarity
library_name: sentence-transformers
---
# crossencoder-distilcamembert-base-mmarcoFR

This is a [sentence-transformers](https://www.SBERT.net) model trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset.

It performs cross-attention between a question-passage pair and outputs a relevance score between 0 and 1. The model can be used for tasks like clustering or [semantic search]((https://www.sbert.net/examples/applications/retrieve_rerank/README.html): given a query, encode the latter with some candidate passages -- e.g., retrieved with BM25 or a biencoder -- then sort the passages in a decreasing order of relevance according to the model's predictions.

## Usage
***

#### Sentence-Transformers

Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:

```bash
pip install -U sentence-transformers
```

Then you can use the model like this:

```python
from sentence_transformers import CrossEncoder
pairs = [('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')]

model = CrossEncoder('crossencoder-distilcamembert-base-mmarcoFR')
scores = model.predict(pairs)
print(scores)
```

#### 🤗 Transformers

Without [sentence-transformers](https://www.SBERT.net), you can use the model as follows:

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained('crossencoder-distilcamembert-base-mmarcoFR')
tokenizer = AutoTokenizer.from_pretrained('crossencoder-distilcamembert-base-mmarcoFR')

pairs = [('Query', 'Paragraph1'), ('Query', 'Paragraph2') , ('Query', 'Paragraph3')]
features = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt')

model.eval()
with torch.no_grad():
    scores = model(**features).logits
print(scores)
```

## Evaluation
***

We evaluated our model on 500 random queries from the mMARCO-fr train set (which were excluded from training). Each of these queries has at least one relevant and up to 200 irrelevant passages.

|   r-precision |   mrr@10 |   recall@10 |   recall@20 |   recall@50 |   recall@100 |
|--------------:|---------:|------------:|------------:|------------:|-------------:|
|         27.28 |    43.71 |        80.3 |        89.1 |       95.55 |         98.6 |

Below, we compared its results with other cross-encoder models fine-tuned on the same dataset:
|    | model                                                                                                                                                                                  |   r-precision |   mrr@10 |   recall@10 (↑) |   recall@20 |   recall@50 |   recall@100 |
|---:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------:|---------:|------------:|------------:|------------:|-------------:|
|  1 | [crossencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-camembert-base-mmarcoFR)                                                                       |         35.65 |    50.44 |       82.95 |       91.5  |       96.8  |        98.8  |
|  2 | [crossencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-mMiniLMv2-L12-H384-distilled-from-XLMR-Large-mmarcoFR)           |         34.37 |    51.01 |       82.23 |       90.6  |       96.45 |        98.4  |
|  3 | [crossencoder-mmarcoFR-mMiniLMv2-L12-H384-v1-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-mmarcoFR-mMiniLMv2-L12-H384-v1-mmarcoFR)                                       |         34.22 |    49.2  |       81.7  |       90.9  |       97.1  |        98.9  |
|  4 | [crossencoder-mpnet-base-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-mpnet-base-mmarcoFR)                                                                               |         29.68 |    46.13 |       80.45 |       87.9  |       93.15 |        96.6  |
|  5 | **crossencoder-distilcamembert-base-mmarcoFR**                                                                                                                                         |         27.28 |    43.71 |       80.3  |       89.1  |       95.55 |        98.6  |
|  6 | [crossencoder-roberta-base-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-roberta-base-mmarcoFR)                                                                           |         33.33 |    48.87 |       79.33 |       86.75 |       94.15 |        97.6  |
|  7 | [crossencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-electra-base-french-europeana-cased-discriminator-mmarcoFR) |         28.32 |    45.28 |       79.22 |       87.15 |       93.15 |        95.75 |
|  8 | [crossencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-mMiniLMv2-L6-H384-distilled-from-XLMR-Large-mmarcoFR)             |         33.92 |    49.33 |       79    |       88.35 |       94.8  |        98.2  |
|  9 | [crossencoder-msmarco-electra-base-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-msmarco-electra-base-mmarcoFR)                                                           |         25.52 |    42.46 |       78.73 |       88.85 |       96.55 |        98.85 |
| 10 | [crossencoder-bert-base-uncased-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-bert-base-uncased-mmarcoFR)                                                                 |         30.48 |    45.79 |       78.35 |       89.45 |       94.15 |        97.45 |
| 11 | [crossencoder-msmarco-MiniLM-L-12-v2-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-msmarco-MiniLM-L-12-v2-mmarcoFR)                                                       |         29.07 |    44.41 |       77.83 |       88.1  |       95.55 |        99    |
| 12 | [crossencoder-msmarco-MiniLM-L-6-v2-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-msmarco-MiniLM-L-6-v2-mmarcoFR)                                                         |         32.92 |    47.56 |       77.27 |       88.15 |       94.85 |        98.15 |
| 13 | [crossencoder-msmarco-MiniLM-L-4-v2-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-msmarco-MiniLM-L-4-v2-mmarcoFR)                                                         |         30.98 |    46.22 |       76.35 |       85.8  |       94.35 |        97.55 |
| 14 | [crossencoder-MiniLM-L6-H384-uncased-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-MiniLM-L6-H384-uncased-mmarcoFR)                                                       |         29.23 |    45.12 |       76.08 |       83.7  |       92.65 |        97.45 |
| 15 | [crossencoder-electra-base-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-electra-base-discriminator-mmarcoFR)                                               |         28.48 |    43.58 |       75.63 |       86.15 |       93.25 |        96.6  |
| 16 | [crossencoder-electra-small-discriminator-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-electra-small-discriminator-mmarcoFR)                                             |         31.83 |    45.97 |       75.13 |       84.95 |       94.55 |        98.15 |
| 17 | [crossencoder-distilroberta-base-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-distilroberta-base-mmarcoFR)                                                               |         28.22 |    42.85 |       74.13 |       84.08 |       94.2  |        98.5  |
| 18 | [crossencoder-msmarco-TinyBERT-L-6-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-msmarco-TinyBERT-L-6-mmarcoFR)                                                           |         28.23 |    42.7  |       73.63 |       85.65 |       92.65 |        98.35 |
| 19 | [crossencoder-msmarco-TinyBERT-L-4-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-msmarco-TinyBERT-L-4-mmarcoFR)                                                           |         28.6  |    43.19 |       72.17 |       81.95 |       92.8  |        97.4  |
| 20 | [crossencoder-msmarco-MiniLM-L-2-v2-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-msmarco-MiniLM-L-2-v2-mmarcoFR)                                                         |         30.82 |    44.3  |       72.03 |       82.65 |       93.35 |        98.1  |
| 21 | [crossencoder-distilbert-base-uncased-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-distilbert-base-uncased-mmarcoFR)                                                     |         25.47 |    40.11 |       71.37 |       85.6  |       93.85 |        97.95 |
| 22 | [crossencoder-msmarco-TinyBERT-L-2-v2-mmarcoFR](https://huggingface.co/antoinelouis/crossencoder-msmarco-TinyBERT-L-2-v2-mmarcoFR)                                                     |         31.08 |    43.88 |       71.3  |       81.43 |       92.6  |        98.1  |

## Training
***

#### Background

We used the [cmarkea/distilcamembert-base](https://huggingface.co/cmarkea/distilcamembert-base) model and fine-tuned it with a binary cross-entropy loss function on 1M question-passage pairs in French with a positive-to-negative ratio of 4 (i.e., 25% of the pairs are relevant and 75% are irrelevant).

#### Hyperparameters

We trained the model on a single Tesla V100 GPU with 32GBs of memory during 10 epochs (i.e., 312.4k steps) using a batch size of 32. We used the adamw optimizer with an initial learning rate of 2e-05, weight decay of 0.01, learning rate warmup over the first 500 steps, and linear decay of the learning rate. The sequence length was limited to 512 tokens.

#### Data

We used the French version of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset to fine-tune our model. mMARCO is a multi-lingual machine-translated version of the MS MARCO dataset, a popular large-scale IR dataset.

## Citation
***

```bibtex
@online{louis2023,
   author    = 'Antoine Louis',
   title     = 'crossencoder-distilcamembert-base-mmarcoFR: A Cross-Encoder Model Trained on 1M sentence pairs in French',
   publisher = 'Hugging Face',
   month     = 'september',
   year      = '2023',
   url       = 'https://huggingface.co/antoinelouis/crossencoder-distilcamembert-base-mmarcoFR',
}
```