antoinelouis
/

biencoder-camembert-base-mmarcoFR

Sentence Similarity

sentence-transformers

passage-retrieval

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

antoinelouis commited on Feb 29, 2024

Commit

f4b0d18

•

1 Parent(s): 945fd75

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ library_name: sentence-transformers
 # biencoder-camembert-base-mmarcoFR
-This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and should be used for semantic search. The model was trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset.
 ## Usage
@@ -114,7 +114,7 @@ We evaluate the model on the smaller development set of [mMARCO-fr](https://ir-d
 #### Data
-We use the French training samples from the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset, a multi-lingual machine-translated version of MS MARCO that contains 8.8M passages and 539K training queries. We do not employ the BM25 netaives provided by the official dataset but instead sample harder negatives mined from 12 distinct dense retrievers, using the [msmarco-hard-negatives](https://huggingface.co/datasets/sentence-transformers/msmarco-hard-negatives) distillation dataset.
 #### Implementation

 # biencoder-camembert-base-mmarcoFR
+This is a dense single-vector bi-encoder model. It maps sentences & paragraphs to a 768 dimensional dense vector space and should be used for semantic search. The model was trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) retrieval dataset.
 ## Usage
 #### Data
+We use the French training samples from the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset, a multilingual machine-translated version of MS MARCO that contains 8.8M passages and 539K training queries. We do not employ the BM25 netaives provided by the official dataset but instead sample harder negatives mined from 12 distinct dense retrievers, using the [msmarco-hard-negatives](https://huggingface.co/datasets/sentence-transformers/msmarco-hard-negatives) distillation dataset.
 #### Implementation