antoinelouis commited on
Commit
7627c1f
·
verified ·
1 Parent(s): e307298

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -7,15 +7,16 @@ datasets:
7
  metrics:
8
  - recall
9
  tags:
10
- - feature-extraction
11
  - sentence-similarity
12
- library_name: colbert
 
 
13
  inference: false
14
  ---
15
 
16
- # colbertv1-camembert-base-mmarcoFR
17
 
18
- This is a [ColBERTv1](https://github.com/stanford-futuredata/ColBERT) model for semantic search. It encodes queries & passages into matrices of token-level embeddings and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators. The model was trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset.
19
 
20
  ## Usage
21
 
@@ -77,6 +78,8 @@ RAG = RAGPretrainedModel.from_index(index_name) # if not already loaded
77
  RAG.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
78
  ```
79
 
 
 
80
  ## Evaluation
81
 
82
  The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its performance to a single-vector representation model fine-tuned on the same dataset. We report the mean reciprocal rank (MRR) and recall at various cut-offs (R@k).
@@ -86,6 +89,8 @@ The model is evaluated on the smaller development set of mMARCO-fr, which consis
86
  | **colbertv1-camembert-base-mmarcoFR** | 🇫🇷 | 110M | 443MB | 29.51 | 54.21 | 80.00 | 88.40 |
87
  | [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR) | 🇫🇷 | 110M | 443MB | 28.53 | 51.46 | 77.82 | 89.13 |
88
 
 
 
89
  ## Training
90
 
91
  #### Data
@@ -107,7 +112,7 @@ to 128, and the maximum sequence lengths for questions and passages length were
107
  ```bibtex
108
  @online{louis2023,
109
  author = 'Antoine Louis',
110
- title = 'colbertv1-camembert-base-mmarcoFR: A ColBERTv1 Model Trained on French mMARCO',
111
  publisher = 'Hugging Face',
112
  month = 'dec',
113
  year = '2023',
 
7
  metrics:
8
  - recall
9
  tags:
 
10
  - sentence-similarity
11
+ - colbert
12
+ base_model: camembert-base
13
+ library_name: RAGatouille
14
  inference: false
15
  ---
16
 
17
+ # 🇫🇷 colbertv1-camembert-base-mmarcoFR
18
 
19
+ This is a [ColBERTv1](https://doi.org/10.48550/arXiv.2004.12832) model for semantic search. It encodes queries & passages into matrices of token-level embeddings and efficiently finds passages that contextually match the query using scalable vector-similarity (MaxSim) operators. The model was trained on the **French** portion of the [mMARCO](https://huggingface.co/datasets/unicamp-dl/mmarco) dataset.
20
 
21
  ## Usage
22
 
 
78
  RAG.search(query="Comment effectuer une recherche avec ColBERT ?", k=10)
79
  ```
80
 
81
+ ***
82
+
83
  ## Evaluation
84
 
85
  The model is evaluated on the smaller development set of mMARCO-fr, which consists of 6,980 queries for a corpus of 8.8M candidate passages. Below, we compared its performance to a single-vector representation model fine-tuned on the same dataset. We report the mean reciprocal rank (MRR) and recall at various cut-offs (R@k).
 
89
  | **colbertv1-camembert-base-mmarcoFR** | 🇫🇷 | 110M | 443MB | 29.51 | 54.21 | 80.00 | 88.40 |
90
  | [biencoder-camembert-base-mmarcoFR](https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR) | 🇫🇷 | 110M | 443MB | 28.53 | 51.46 | 77.82 | 89.13 |
91
 
92
+ ***
93
+
94
  ## Training
95
 
96
  #### Data
 
112
  ```bibtex
113
  @online{louis2023,
114
  author = 'Antoine Louis',
115
+ title = 'colbertv1-camembert-base-mmarcoFR: A ColBERTv1 Model for French',
116
  publisher = 'Hugging Face',
117
  month = 'dec',
118
  year = '2023',