--- tags: - mteb - qihoo360 - 奇虎360 - RAG-retrieval model-index: - name: 360Zhinao_search results: - task: type: Reranking dataset: type: C-MTEB/CMedQAv1-reranking name: MTEB CMedQAv1 config: default split: test revision: None metrics: - type: map value: 87.004722953844 - type: mrr value: 89.34686507936507 - task: type: Reranking dataset: type: C-MTEB/CMedQAv2-reranking name: MTEB CMedQAv2 config: default split: test revision: None metrics: - type: map value: 88.48306990136507 - type: mrr value: 90.57761904761904 - task: type: Reranking dataset: type: C-MTEB/Mmarco-reranking name: MTEB MMarcoReranking config: default split: dev revision: None metrics: - type: map value: 32.40909999537645 - type: mrr value: 31.48690476190476 - task: type: Reranking dataset: type: C-MTEB/T2Reranking name: MTEB T2Reranking config: default split: dev revision: None metrics: - type: map value: 67.80300509862872 - type: mrr value: 78.14543234355354 - task: type: Retrieval dataset: type: C-MTEB/CmedqaRetrieval name: MTEB CmedqaRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 27.171 - type: map_at_10 value: 40.109 - type: map_at_100 value: 41.937999999999995 - type: map_at_1000 value: 42.051 - type: map_at_3 value: 35.882999999999996 - type: map_at_5 value: 38.22 - type: mrr_at_1 value: 41.285 - type: mrr_at_10 value: 49.247 - type: mrr_at_100 value: 50.199000000000005 - type: mrr_at_1000 value: 50.245 - type: mrr_at_3 value: 46.837 - type: mrr_at_5 value: 48.223 - type: ndcg_at_1 value: 41.285 - type: ndcg_at_10 value: 46.727000000000004 - type: ndcg_at_100 value: 53.791 - type: ndcg_at_1000 value: 55.706 - type: ndcg_at_3 value: 41.613 - type: ndcg_at_5 value: 43.702999999999996 - type: precision_at_1 value: 41.285 - type: precision_at_10 value: 10.34 - type: precision_at_100 value: 1.6019999999999999 - type: precision_at_1000 value: 0.184 - type: precision_at_3 value: 23.423 - type: precision_at_5 value: 16.914 - type: recall_at_1 value: 27.171 - type: recall_at_10 value: 57.04900000000001 - type: recall_at_100 value: 86.271 - type: recall_at_1000 value: 99.02300000000001 - type: recall_at_3 value: 41.528 - type: recall_at_5 value: 48.162 - task: type: Retrieval dataset: type: C-MTEB/CovidRetrieval name: MTEB CovidRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 73.762 - type: map_at_10 value: 81.663 - type: map_at_100 value: 81.87100000000001 - type: map_at_1000 value: 81.877 - type: map_at_3 value: 80.10199999999999 - type: map_at_5 value: 81.162 - type: mrr_at_1 value: 74.078 - type: mrr_at_10 value: 81.745 - type: mrr_at_100 value: 81.953 - type: mrr_at_1000 value: 81.959 - type: mrr_at_3 value: 80.25999999999999 - type: mrr_at_5 value: 81.266 - type: ndcg_at_1 value: 73.973 - type: ndcg_at_10 value: 85.021 - type: ndcg_at_100 value: 85.884 - type: ndcg_at_1000 value: 86.02300000000001 - type: ndcg_at_3 value: 82.03399999999999 - type: ndcg_at_5 value: 83.905 - type: precision_at_1 value: 73.973 - type: precision_at_10 value: 9.631 - type: precision_at_100 value: 1 - type: precision_at_1000 value: 0.101 - type: precision_at_3 value: 29.329 - type: precision_at_5 value: 18.546000000000003 - type: recall_at_1 value: 73.762 - type: recall_at_10 value: 95.258 - type: recall_at_100 value: 98.946 - type: recall_at_1000 value: 100 - type: recall_at_3 value: 87.46000000000001 - type: recall_at_5 value: 91.93900000000001 - task: type: Retrieval dataset: type: C-MTEB/DuRetrieval name: MTEB DuRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 25.967000000000002 - type: map_at_10 value: 79.928 - type: map_at_100 value: 82.76400000000001 - type: map_at_1000 value: 82.794 - type: map_at_3 value: 54.432 - type: map_at_5 value: 69.246 - type: mrr_at_1 value: 89 - type: mrr_at_10 value: 92.81 - type: mrr_at_100 value: 92.857 - type: mrr_at_1000 value: 92.86 - type: mrr_at_3 value: 92.467 - type: mrr_at_5 value: 92.67699999999999 - type: ndcg_at_1 value: 89 - type: ndcg_at_10 value: 87.57000000000001 - type: ndcg_at_100 value: 90.135 - type: ndcg_at_1000 value: 90.427 - type: ndcg_at_3 value: 84.88900000000001 - type: ndcg_at_5 value: 84.607 - type: precision_at_1 value: 89 - type: precision_at_10 value: 42.245 - type: precision_at_100 value: 4.8340000000000005 - type: precision_at_1000 value: 0.49 - type: precision_at_3 value: 75.883 - type: precision_at_5 value: 64.88000000000001 - type: recall_at_1 value: 25.967000000000002 - type: recall_at_10 value: 89.79599999999999 - type: recall_at_100 value: 98.042 - type: recall_at_1000 value: 99.61 - type: recall_at_3 value: 57.084 - type: recall_at_5 value: 74.763 - task: type: Retrieval dataset: type: C-MTEB/EcomRetrieval name: MTEB EcomRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 53.6 - type: map_at_10 value: 63.94800000000001 - type: map_at_100 value: 64.37899999999999 - type: map_at_1000 value: 64.39200000000001 - type: map_at_3 value: 61.683 - type: map_at_5 value: 63.078 - type: mrr_at_1 value: 53.6 - type: mrr_at_10 value: 63.94800000000001 - type: mrr_at_100 value: 64.37899999999999 - type: mrr_at_1000 value: 64.39200000000001 - type: mrr_at_3 value: 61.683 - type: mrr_at_5 value: 63.078 - type: ndcg_at_1 value: 53.6 - type: ndcg_at_10 value: 68.904 - type: ndcg_at_100 value: 71.019 - type: ndcg_at_1000 value: 71.345 - type: ndcg_at_3 value: 64.30799999999999 - type: ndcg_at_5 value: 66.8 - type: precision_at_1 value: 53.6 - type: precision_at_10 value: 8.44 - type: precision_at_100 value: 0.943 - type: precision_at_1000 value: 0.097 - type: precision_at_3 value: 23.967 - type: precision_at_5 value: 15.58 - type: recall_at_1 value: 53.6 - type: recall_at_10 value: 84.39999999999999 - type: recall_at_100 value: 94.3 - type: recall_at_1000 value: 96.8 - type: recall_at_3 value: 71.89999999999999 - type: recall_at_5 value: 77.9 - task: type: Retrieval dataset: type: C-MTEB/MMarcoRetrieval name: MTEB MMarcoRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 71.375 - type: map_at_10 value: 80.05600000000001 - type: map_at_100 value: 80.28699999999999 - type: map_at_1000 value: 80.294 - type: map_at_3 value: 78.479 - type: map_at_5 value: 79.51899999999999 - type: mrr_at_1 value: 73.739 - type: mrr_at_10 value: 80.535 - type: mrr_at_100 value: 80.735 - type: mrr_at_1000 value: 80.742 - type: mrr_at_3 value: 79.212 - type: mrr_at_5 value: 80.059 - type: ndcg_at_1 value: 73.739 - type: ndcg_at_10 value: 83.321 - type: ndcg_at_100 value: 84.35000000000001 - type: ndcg_at_1000 value: 84.542 - type: ndcg_at_3 value: 80.401 - type: ndcg_at_5 value: 82.107 - type: precision_at_1 value: 73.739 - type: precision_at_10 value: 9.878 - type: precision_at_100 value: 1.039 - type: precision_at_1000 value: 0.106 - type: precision_at_3 value: 30.053 - type: precision_at_5 value: 18.953999999999997 - type: recall_at_1 value: 71.375 - type: recall_at_10 value: 92.84599999999999 - type: recall_at_100 value: 97.49799999999999 - type: recall_at_1000 value: 98.992 - type: recall_at_3 value: 85.199 - type: recall_at_5 value: 89.22 - task: type: Retrieval dataset: type: C-MTEB/MedicalRetrieval name: MTEB MedicalRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 55.60000000000001 - type: map_at_10 value: 61.035 - type: map_at_100 value: 61.541999999999994 - type: map_at_1000 value: 61.598 - type: map_at_3 value: 59.683 - type: map_at_5 value: 60.478 - type: mrr_at_1 value: 55.60000000000001 - type: mrr_at_10 value: 61.035 - type: mrr_at_100 value: 61.541999999999994 - type: mrr_at_1000 value: 61.598 - type: mrr_at_3 value: 59.683 - type: mrr_at_5 value: 60.478 - type: ndcg_at_1 value: 55.60000000000001 - type: ndcg_at_10 value: 63.686 - type: ndcg_at_100 value: 66.417 - type: ndcg_at_1000 value: 67.92399999999999 - type: ndcg_at_3 value: 60.951 - type: ndcg_at_5 value: 62.388 - type: precision_at_1 value: 55.60000000000001 - type: precision_at_10 value: 7.199999999999999 - type: precision_at_100 value: 0.8540000000000001 - type: precision_at_1000 value: 0.097 - type: precision_at_3 value: 21.532999999999998 - type: precision_at_5 value: 13.62 - type: recall_at_1 value: 55.60000000000001 - type: recall_at_10 value: 72 - type: recall_at_100 value: 85.39999999999999 - type: recall_at_1000 value: 97.3 - type: recall_at_3 value: 64.60000000000001 - type: recall_at_5 value: 68.10000000000001 - task: type: Retrieval dataset: type: C-MTEB/T2Retrieval name: MTEB T2Retrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 28.314 - type: map_at_10 value: 80.268 - type: map_at_100 value: 83.75399999999999 - type: map_at_1000 value: 83.80499999999999 - type: map_at_3 value: 56.313 - type: map_at_5 value: 69.336 - type: mrr_at_1 value: 91.96 - type: mrr_at_10 value: 93.926 - type: mrr_at_100 value: 94 - type: mrr_at_1000 value: 94.003 - type: mrr_at_3 value: 93.587 - type: mrr_at_5 value: 93.804 - type: ndcg_at_1 value: 91.96 - type: ndcg_at_10 value: 87.12299999999999 - type: ndcg_at_100 value: 90.238 - type: ndcg_at_1000 value: 90.723 - type: ndcg_at_3 value: 88.347 - type: ndcg_at_5 value: 87.095 - type: precision_at_1 value: 91.96 - type: precision_at_10 value: 43.257 - type: precision_at_100 value: 5.064 - type: precision_at_1000 value: 0.517 - type: precision_at_3 value: 77.269 - type: precision_at_5 value: 64.89 - type: recall_at_1 value: 28.314 - type: recall_at_10 value: 85.917 - type: recall_at_100 value: 96.297 - type: recall_at_1000 value: 98.802 - type: recall_at_3 value: 57.75900000000001 - type: recall_at_5 value: 72.287 - task: type: Retrieval dataset: type: C-MTEB/VideoRetrieval name: MTEB VideoRetrieval config: default split: dev revision: None metrics: - type: map_at_1 value: 65.60000000000001 - type: map_at_10 value: 74.502 - type: map_at_100 value: 74.864 - type: map_at_1000 value: 74.875 - type: map_at_3 value: 73.3 - type: map_at_5 value: 74.07000000000001 - type: mrr_at_1 value: 65.60000000000001 - type: mrr_at_10 value: 74.502 - type: mrr_at_100 value: 74.864 - type: mrr_at_1000 value: 74.875 - type: mrr_at_3 value: 73.3 - type: mrr_at_5 value: 74.07000000000001 - type: ndcg_at_1 value: 65.60000000000001 - type: ndcg_at_10 value: 78.091 - type: ndcg_at_100 value: 79.838 - type: ndcg_at_1000 value: 80.10199999999999 - type: ndcg_at_3 value: 75.697 - type: ndcg_at_5 value: 77.07000000000001 - type: precision_at_1 value: 65.60000000000001 - type: precision_at_10 value: 8.9 - type: precision_at_100 value: 0.971 - type: precision_at_1000 value: 0.099 - type: precision_at_3 value: 27.533 - type: precision_at_5 value: 17.18 - type: recall_at_1 value: 65.60000000000001 - type: recall_at_10 value: 89 - type: recall_at_100 value: 97.1 - type: recall_at_1000 value: 99.1 - type: recall_at_3 value: 82.6 - type: recall_at_5 value: 85.9 license: apache-2.0 library_name: transformers --- # Model Introduction 360Zhinao-search uses the self-developed BERT model as the base for multi-task fine-tuning, which has an average score of 75.05 on the Retrieval task on the C-MTEB-Retrieval benchmark, currently ranking first. [C-MTEB-Retrieval leaderboard](https://huggingface.co/spaces/mteb/leaderboard) contains a total of 8 [query, passage] similarity retrieval subtasks in different fields, using NDCG@10 (Normalized Discounted Cumulative Gain @ 10) as the evaluation index. | Model | T2Retrieval | MMarcoRetrieval | DuRetrieval | CovidRetrieval | CmedqaRetrieval | EcomRetrieval | MedicalRetrieval | VideoRetrieval | Avg | |:-------------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| |**360Zhinao-search** | 87.12 | 83.32 | 87.57 | 85.02 | 46.73 | 68.9 | 63.69 | 78.09 | **75.05** | |AGE_Hybrid | 86.88 | 80.65 | 89.28 | 83.66 | 47.26 | 69.28 | 65.94 | 76.79 | 74.97 | |OpenSearch-text-hybrid | 86.76 | 79.93 | 87.85 | 84.03 | 46.56 | 68.79 | 65.92 | 75.43 | 74.41 | |piccolo-large-zh-v2 | 86.14 | 79.54 | 89.14 | 86.78 | 47.58 | 67.75 | 64.88 | 73.1 | 74.36 | |stella-large-zh-v3-1792d | 85.56 | 79.14 | 87.13 | 82.44 | 46.87 | 68.62 | 65.18 | 73.89 | 73.6 | ## Optimization points 1. Data filtering: Strictly prevent the C-MTEB-Retrieval test data from leaking, and clean all queries and passages in the test set; 2. Data source enhancement: Use open source data and LLM synthetic data to improve data diversity; 3. Negative example mining: Use multiple methods to deeply mine difficult-to-distinguish negative examples to improve information gain; 4. Training efficiency: multi-machine multi-GPU training + Deepspeed method to optimize GPU memory utilization. ## Usage ```bash from typing import cast, List, Dict, Union from transformers import AutoModel, AutoTokenizer import torch import numpy as np tokenizer = AutoTokenizer.from_pretrained('qihoo360/360Zhinao-search') model = AutoModel.from_pretrained('qihoo360/360Zhinao-search') sentences = ['天空是什么颜色的', '天空是蓝色的'] inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt', max_length=512) if __name__ == "__main__": with torch.no_grad(): last_hidden_state = model(**inputs, return_dict=True).last_hidden_state embeddings = last_hidden_state[:, 0] embeddings = torch.nn.functional.normalize(embeddings, dim=-1) embeddings = embeddings.cpu().numpy() print("embeddings:") print(embeddings) cos_sim = np.dot(embeddings[0], embeddings[1]) print("cos_sim:", cos_sim) ``` ## Reference [bge fine-tuning code](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) [C-MTEB official test script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB) ## License The source code of this repository follows the open-source license Apache 2.0. 360​Zhinao open-source models support commercial use. If you wish to use these models or continue training them for commercial purposes, please contact us via email (g-zhinao-opensource@360.cn) to apply. For the specific license agreement, please see <<360 Zhinao Open-Source Model License>>.