sensenova
/

piccolo-base-zh

Feature Extraction

text-embeddings-inference

Inference Endpoints

Model card Files Files and versions Community

Jinkin commited on Sep 5, 2023

Commit

d1d71a7

•

1 Parent(s): e79b872

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -1056,3 +1056,11 @@ model-index:
 ---
 ## piccolo-base-zh

 ---
 ## piccolo-base-zh
+piccolo is a general text embedding model, powered by General Model Group from SenseTime Research.
+Based on BERT framework, piccolo is trained using a two stage pipeline. On the first stage, we collect and crawl 400 million weakly supervised Chinese text pairs from the Internet,
+and train the model with the pair(text and text pos) softmax contrastive loss.
+On the second stage, we collect 20 million human labeled chinese text pairs from the open-source dataset, and finetune the model with tiplet (text, text_pos, text_neg) contrastive loss.
+Currently here we offer two different sizes of models, including piccolo-base-zh, piccolo-large-zh.