Update README.md
Browse files
README.md
CHANGED
@@ -1068,4 +1068,16 @@ and train the model with the pair(text and text pos) softmax contrastive loss.
|
|
1068 |
On the second stage, we collect 20 million human labeled chinese text pairs from the open-source dataset, and finetune the model with tiplet (text, text_pos, text_neg) contrastive loss.
|
1069 |
Currently here we offer two different sizes of models, including piccolo-base-zh, piccolo-large-zh.
|
1070 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1071 |
|
|
|
1068 |
On the second stage, we collect 20 million human labeled chinese text pairs from the open-source dataset, and finetune the model with tiplet (text, text_pos, text_neg) contrastive loss.
|
1069 |
Currently here we offer two different sizes of models, including piccolo-base-zh, piccolo-large-zh.
|
1070 |
|
1071 |
+
## Metric
|
1072 |
+
我们将piccolo与其他的开源embedding模型在CMTEB榜单上进行了比较,请参考CMTEB榜单。我们在eval文件夹中提供了复现结果的脚本。
|
1073 |
+
We compared the performance of the piccolo with other embedding models on the C-MTEB benchmark. please refer to the C-MTEB leaderboard. wo provide scripts in "eval" folder for results reproducing.
|
1074 |
+
|
1075 |
+
|
1076 |
+
| Model Name | Model Size (GB) | Dimension | Sequence Length | Average (35) | Classification (9) | Clustering (4) | Pair Classification (2) | Reranking (4) | Retrieval (8) | STS (8) |
|
1077 |
+
|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
1078 |
+
| [**piccolo-large-zh**] | 0.65 | 1024 | 512 | **64.11** | 67.03 | 47.04 | 78.38 | 65.98 | 70.93 | 58.02 |
|
1079 |
+
| [bge-large-zh]| 1.3 | 1024| 512 | 63.96 | 68.32 | 48.39 | 78.94 | 65.11 | 71.52 | 54.98 |
|
1080 |
+
| [**piccolo-base-zh**]| 0.2 | 768 | 512 | **63.66** | 66.98 | 47.12 | 76.61 | 66.68 | 71.2 | 55.9 |
|
1081 |
+
| [bge-large-zh-no-instruct]| 1.3 | 1024 | 512 | 63.4 | 68.58 | 50.01 | 76.77 | 64.9 | 70.54 | 53 |
|
1082 |
+
| [bge-base-zh]| 0.41 | 768 | 512 | 62.8 | 67.07 | 47.64 | 77.5 | 64.91 | 69.53 | 54.12 |
|
1083 |
|