Update README.md
Browse files
README.md
CHANGED
@@ -1056,3 +1056,11 @@ model-index:
|
|
1056 |
---
|
1057 |
|
1058 |
## piccolo-base-zh
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1056 |
---
|
1057 |
|
1058 |
## piccolo-base-zh
|
1059 |
+
|
1060 |
+
piccolo is a general text embedding model, powered by General Model Group from SenseTime Research.
|
1061 |
+
Based on BERT framework, piccolo is trained using a two stage pipeline. On the first stage, we collect and crawl 400 million weakly supervised Chinese text pairs from the Internet,
|
1062 |
+
and train the model with the pair(text and text pos) softmax contrastive loss.
|
1063 |
+
On the second stage, we collect 20 million human labeled chinese text pairs from the open-source dataset, and finetune the model with tiplet (text, text_pos, text_neg) contrastive loss.
|
1064 |
+
Currently here we offer two different sizes of models, including piccolo-base-zh, piccolo-large-zh.
|
1065 |
+
|
1066 |
+
|