Shitao commited on
Commit
d5ab3da
1 Parent(s): 4277867

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -13
README.md CHANGED
@@ -29,14 +29,24 @@ Utilizing the re-ranking model (e.g., [bge-reranker](https://github.com/FlagOpen
29
  - 2/1/2024: **Thanks for the excellent tool from Vespa.** You can easily use multiple modes of BGE-M3 following this [notebook](https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb)
30
 
31
 
32
- ## Model Specs
33
 
34
- | Model Name | Dimension | Sequence Length |
35
- |:----:|:---:|:---:|
36
- | [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | 1024 | 8192 |
37
- | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 |
38
- | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 768 | 512 |
39
- | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | 384 | 512 |
 
 
 
 
 
 
 
 
 
 
40
 
41
 
42
 
@@ -232,19 +242,17 @@ Refer to our [report](https://github.com/FlagOpen/FlagEmbedding/blob/master/Flag
232
 
233
  **The fine-tuning codes and datasets will be open-sourced in the near future.**
234
 
235
- ## Models
236
 
237
- We release two versions:
238
- - BAAI/bge-m3-unsupervised: the model after contrastive learning in a large-scale dataset
239
- - BAAI/bge-m3: the final model fine-tuned from BAAI/bge-m3-unsupervised
240
 
241
  ## Acknowledgement
242
 
243
- Thanks the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
 
 
244
 
245
  ## Citation
246
 
247
- If you find this repository useful, please consider giving a star :star: and citation
248
 
249
  ```
250
 
 
29
  - 2/1/2024: **Thanks for the excellent tool from Vespa.** You can easily use multiple modes of BGE-M3 following this [notebook](https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb)
30
 
31
 
32
+ ## Specs
33
 
34
+ - Model
35
+ | Model Name | Dimension | Sequence Length | Introduction |
36
+ |:----:|:---:|:---:|:---:|
37
+ | [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | 1024 | 8192 | multilingual; unified fine-tuning (dense, sparse, and colbert) from bge-m3-unsupervised|
38
+ | [BAAI/bge-m3-unsupervised](https://huggingface.co/BAAI/bge-m3-unsupervised) | 1024 | 8192 | multilingual; contrastive learning from bge-m3-retromae |
39
+ | [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) | -- | 8192 | multilingual; extend the max_length of [xlm-roberta](https://huggingface.co/FacebookAI/xlm-roberta-large) to 8192 and further pretrained via [retromae](https://github.com/staoxiao/RetroMAE)|
40
+ | [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 | English model |
41
+ | [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) | 768 | 512 | English model |
42
+ | [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | 384 | 512 | English model |
43
+
44
+ - Data
45
+
46
+ - Model
47
+ | Dataset | Introduction |
48
+ |:----:|:---:|
49
+ | [MLDR](https://huggingface.co/datasets/Shitao/MLDR) | Docuemtn Retrieval Dataset, covering 13 languages|
50
 
51
 
52
 
 
242
 
243
  **The fine-tuning codes and datasets will be open-sourced in the near future.**
244
 
 
245
 
 
 
 
246
 
247
  ## Acknowledgement
248
 
249
+ Thanks to the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
250
+ Thanks to the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [pyserini](https://github.com/castorini/pyserini).
251
+
252
 
253
  ## Citation
254
 
255
+ If you find this repository useful, please consider giving a star :star: and a citation
256
 
257
  ```
258