izhx
/

udever-bloom-560m

 ---
 license: bigscience-bloom-rail-1.0
 ---
+# Model Card for udever-bloom
+<!-- Provide a quick summary of what the model is/does. -->
+`udever-bloom-560m` is finetuned from [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m) via [BitFit](https://aclanthology.org/2022.acl-short.1/) on MS MARCO Passage Ranking, SNLI and MultiNLI data.
+It is a universal embedding model across tasks, natural and programming languages.
+(From a technical view, `udever` is merely with some minor improvements to `sgpt-bloom`)
+<div align=center><img width="338" height="259" src="https://user-images.githubusercontent.com/26690193/277643721-cdb7f227-cae5-40e1-b6e1-a201bde00339.png" /></div>
+## Model Details
+### Model Description
+- **Developed by:** Alibaba Group
+- **Model type:** Transformer-based Language Model (decoder-only)
+- **Language(s) (NLP):** Multiple; see [bloom training data](https://huggingface.co/bigscience/bloom-560m#training-data)
+- **Finetuned from model :** [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m)
+### Model Sources
+<!-- Provide the basic links for the model. -->
+- **Repository:** [github.com/izhx/uni-rep](https://github.com/izhx/uni-rep)
+- **Paper :** [Language Models are Universal Embedders](https://arxiv.org/pdf/2310.08232.pdf)
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```python
+```
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+- MS MARCO Passage Ranking, retrieved by (https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/ms_marco/train_bi-encoder_mnrl.py#L86)
+- SNLI and MultiNLI (https://sbert.net/datasets/AllNLI.tsv.gz)
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+MS MARCO hard negatives provided by (https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/ms_marco/train_bi-encoder_mnrl.py#L86).
+Negatives for SNLI and MultiNLI are randomly sampled.
+#### Training Hyperparameters
+- **Training regime:** tf32, BitFit
+- **Batch size:** 1024
+- **Epochs:** 3
+- **Optimizer:** AdamW
+- **Learning rate:** 1e-4
+- **Scheduler:** constant with warmup.
+- **Warmup:** 0.25 epoch
+## Evaluation
+### Table 1: Massive Text Embedding Benchmark [MTEB](https://arxiv.org/abs/2210.07316)
+| MTEB | Avg.  |  Class.  | Clust.  | PairClass.  | Rerank.  | Retr.  | STS  | Summ.  |
+|-----------------------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------|
+| #Datasets ➡️  | 56  | 12  | 11  | 3  | 4  | 15  | 10  | 1  |
+||
+| bge-large-en-v1.5 |  **64.23** |  **75.97** |  46.08|  **87.12** |  **60.03** |  **54.29** |  83.11|  31.61 |
+| bge-base-en-v1.5 |  63.55|  75.53|  45.77|  86.55|  58.86|  53.25|  82.4|  31.07 |
+| gte-large |  63.13|  73.33|  **46.84** |  85|  59.13|  52.22|  **83.35** |  31.66 |
+| gte-base |  62.39|  73.01|  46.2|  84.57|  58.61|  51.14|  82.3|  31.17 |
+| e5-large-v2      |  62.25|  75.24|  44.49|  86.03|  56.61|  50.56|  82.05|  30.19 |
+| instructor-xl    |  61.79|  73.12|  44.74|  86.62|  57.29|  49.26|  83.06|  32.32 |
+| instructor-large |  61.59|  73.86|  45.29|  85.89|  57.54|  47.57|  83.15|  31.84 |
+| e5-base-v2       |  61.5 |  73.84|  43.8|  85.73|  55.91|  50.29|  81.05|  30.28 |
+| e5-large         |  61.42|  73.14|  43.33|  85.94|  56.53|  49.99|  82.06|  30.97 |
+| text-embedding-ada-002 (OpenAI API)         |  60.99|  70.93|  45.9 |  84.89|  56.32|  49.25|  80.97|  30.8  |
+| e5-base          |  60.44|  72.63|  42.11|  85.09|  55.7 |  48.75|  80.96|  31.01 |
+| SGPT-5.8B-msmarco  |  58.93|  68.13|  40.34|  82   |  56.56|  50.25|  78.1 |  31.46 |
+| sgpt-bloom-7b1-msmarco |  57.59|  66.19|  38.93|  81.9 |  55.65|  48.22|  77.74|  **33.6**  |
+||
+| Udever-bloom-560m |   55.80|    68.04|  36.89|  81.05|  52.60|  41.19|  79.93|  32.06 |
+| Udever-bloom-1b1 |   58.28|   70.18|  39.11|  83.11|  54.28|  45.27|  81.52|  31.10 |
+| Udever-bloom-3b  |   59.86|     71.91|  40.74|  84.06|  54.90|  47.67|  82.37|  30.62 |
+| Udever-bloom-7b1  |  60.63 |    72.13|  40.81|  85.40|  55.91|  49.34|  83.01|  30.97   |
+### Table 2: [CodeSearchNet](https://arxiv.org/abs/1909.09436)
+| CodeSearchNet | Go  | Ruby  | Python  | Java  | JS  | PHP  | Avg. |
+|-|-|-|-|-|-|-|-|
+| CodeBERT  | 69.3  | 70.6  | 84.0  | 86.8  | 74.8  | 70.6  | 76.0 |
+| GraphCodeBERT  | 84.1  | 73.2  | 87.9  | 75.7  | 71.1  | 72.5  | 77.4 |
+| cpt-code S  | **97.7**  | **86.3**  | 99.8  | 94.0  | 86.0  | 96.7  | 93.4 |
+| cpt-code M  | 97.5  | 85.5  | **99.9**  | **94.4**  | **86.5**  | **97.2**  | **93.5** |
+| sgpt-bloom-7b1-msmarco  | 76.79  | 69.25  | 95.68  | 77.93  | 70.35  | 73.45  | 77.24 |
+||
+| Udever-bloom-560m   | 75.38  | 66.67  | 96.23  | 78.99  | 69.39  | 73.69  | 76.73 |
+| Udever-bloom-1b1   | 78.76  | 72.85  | 97.67  | 82.77  | 74.38  | 78.97  | 80.90 |
+| Udever-bloom-3b   | 80.63  | 75.40  | 98.02  | 83.88  | 76.18  | 79.67  | 82.29 |
+| Udever-bloom-7b1   | 79.37  | 76.59  | 98.38  | 84.68  | 77.49  | 80.03  | 82.76 |
+### Table 3: Chinese multi-domain retrieval [Multi-cpr](https://dl.acm.org/doi/10.1145/3477495.3531736)
+| | | |E-commerce | | Entertainment video | | Medical |   |
+|--|--|--|--|--|--|--|--|--|
+| Model | Train | Backbone | MRR@10 | Recall@1k | MRR@10 | Recall@1k | MRR@10 | Recall@1k |
+||
+| BM25 | - | - | 0.225 | 0.815 | 0.225 | 0.780 | 0.187 | 0.482 |
+| Doc2Query | - | - | 0.239 | 0.826 | 0.238 | 0.794 | 0.210 | 0.505 |
+| DPR-1 | In-Domain | BERT | 0.270 | 0.921 | 0.254 | 0.934 | 0.327 | 0.747 |
+| DPR-2 | In-Domain | BERT-CT | 0.289 | **0.926** | 0.263 | **0.935** | 0.339  | **0.769** |
+| text-embedding-ada-002 | General | GPT | 0.183 | 0.825 | 0.159 | 0.786 | 0.245 | 0.593 |
+| sgpt-bloom-7b1-msmarco | General | BLOOM | 0.242 | 0.840 | 0.227 | 0.829 | 0.311 | 0.675 |
+||
+ | Udever-bloom-560m | General | BLOOM | 0.156 | 0.802 | 0.149 | 0.749 | 0.245  | 0.571 |
+ | Udever-bloom-1b1 | General | BLOOM | 0.244 | 0.863 | 0.208 | 0.815 | 0.241  | 0.557 |
+ | Udever-bloom-3b | General | BLOOM | 0.267 | 0.871 | 0.228 | 0.836 | 0.288  | 0.619 |
+ | Udever-bloom-7b1 | General | BLOOM | **0.296** | 0.889 | **0.267** | 0.907 | **0.343**  | 0.705 |
+#### More results refer to [paper](https://arxiv.org/pdf/2310.08232.pdf) section 3.
+## Technical Specifications
+### Model Architecture and Objective
+- Model: [bigscience/bloom-560m](https://huggingface.co/bigscience/bloom-560m).
+- Objective: Constrastive loss with hard negatives (refer to [paper](https://arxiv.org/pdf/2310.08232.pdf) section 2.2).
+### Compute Infrastructure
+- Nvidia A100 SXM4 80GB.
+- torch 2.0.0, transformers 4.29.2.
+## Citation
+**BibTeX:**
+```BibTeX
+@article{zhang2023language,
+  title={Language Models are Universal Embedders},
+  author={Zhang, Xin and Li, Zehan and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan and Zhang, Min},
+  journal={arXiv preprint arXiv:2310.08232},
+  year={2023}
+}
+```