antoinelouis
commited on
Commit
•
f4b0d18
1
Parent(s):
945fd75
Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ library_name: sentence-transformers
|
|
14 |
|
15 |
# biencoder-camembert-base-mmarcoFR
|
16 |
|
17 |
-
This is a
|
18 |
|
19 |
## Usage
|
20 |
|
@@ -114,7 +114,7 @@ We evaluate the model on the smaller development set of [mMARCO-fr](https://ir-d
|
|
114 |
|
115 |
#### Data
|
116 |
|
117 |
-
We use the French training samples from the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset, a
|
118 |
|
119 |
#### Implementation
|
120 |
|
|
|
14 |
|
15 |
# biencoder-camembert-base-mmarcoFR
|
16 |
|
17 |
+
This is a dense single-vector bi-encoder model. It maps sentences & paragraphs to a 768 dimensional dense vector space and should be used for semantic search. The model was trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) retrieval dataset.
|
18 |
|
19 |
## Usage
|
20 |
|
|
|
114 |
|
115 |
#### Data
|
116 |
|
117 |
+
We use the French training samples from the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset, a multilingual machine-translated version of MS MARCO that contains 8.8M passages and 539K training queries. We do not employ the BM25 netaives provided by the official dataset but instead sample harder negatives mined from 12 distinct dense retrievers, using the [msmarco-hard-negatives](https://huggingface.co/datasets/sentence-transformers/msmarco-hard-negatives) distillation dataset.
|
118 |
|
119 |
#### Implementation
|
120 |
|