BAAI
/

bge-m3

@@ -29,14 +29,24 @@ Utilizing the re-ranking model (e.g., [bge-reranker](https://github.com/FlagOpen
 - 2/1/2024: **Thanks for the excellent tool from Vespa.** You can easily use multiple modes of BGE-M3 following this [notebook](https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb)
-## Model Specs
-| Model Name |  Dimension | Sequence Length |
-|:----:|:---:|:---:|
-| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | 1024 | 8192 |
-| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 |
-| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) |  768 | 512 |
-| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) |  384 | 512 |
@@ -232,19 +242,17 @@ Refer to our [report](https://github.com/FlagOpen/FlagEmbedding/blob/master/Flag
 **The fine-tuning codes and datasets will be open-sourced in the near future.**
-## Models
-We release two versions:
-- BAAI/bge-m3-unsupervised: the model after contrastive learning in a large-scale dataset
-- BAAI/bge-m3: the final model fine-tuned from BAAI/bge-m3-unsupervised
 ## Acknowledgement
-Thanks the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
 ## Citation
-If you find this repository useful, please consider giving a star :star: and citation
 ```

 - 2/1/2024: **Thanks for the excellent tool from Vespa.** You can easily use multiple modes of BGE-M3 following this [notebook](https://github.com/vespa-engine/pyvespa/blob/master/docs/sphinx/source/examples/mother-of-all-embedding-models-cloud.ipynb)
+## Specs
+- Model
+| Model Name |  Dimension | Sequence Length | Introduction |
+|:----:|:---:|:---:|:---:|
+| [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) | 1024 | 8192 | multilingual; unified fine-tuning (dense, sparse, and colbert) from bge-m3-unsupervised|
+| [BAAI/bge-m3-unsupervised](https://huggingface.co/BAAI/bge-m3-unsupervised) | 1024 | 8192 | multilingual; contrastive learning from bge-m3-retromae |
+| [BAAI/bge-m3-retromae](https://huggingface.co/BAAI/bge-m3-retromae) | -- | 8192 | multilingual; extend the max_length of [xlm-roberta](https://huggingface.co/FacebookAI/xlm-roberta-large) to 8192 and further pretrained via [retromae](https://github.com/staoxiao/RetroMAE)|
+| [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5) | 1024 | 512 | English model |
+| [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) |  768 | 512 | English model |
+| [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) |  384 | 512 | English model |
+- Data
+- Model
+| Dataset |  Introduction |
+|:----:|:---:|
+| [MLDR](https://huggingface.co/datasets/Shitao/MLDR) | Docuemtn Retrieval Dataset, covering 13 languages|
 **The fine-tuning codes and datasets will be open-sourced in the near future.**
 ## Acknowledgement
+Thanks to the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
+Thanks to the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [pyserini](https://github.com/castorini/pyserini).
 ## Citation
+If you find this repository useful, please consider giving a star :star: and a citation
 ```