rmihaylov
/

roberta-base-nli-stsb-theseus-bg

Sentence Similarity

feature-extraction

text-embeddings-inference

Model card Files Files and versions Community

rmihaylov commited on Apr 18, 2022

Commit

d60f80a

•

1 Parent(s): 75cb137

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -12,11 +12,14 @@ tags:
 - torch
 ---
-# ROBERTA BASE (cased) finetuned on private Bulgarian STSB, NLI data
 This model is cased: it does make a difference between bulgarian and Bulgarian.
-It was finetuned on private Bulgarian STSB, NLI data.
 Then, it was compressed via [progressive module replacing](https://arxiv.org/abs/2002.02925).

 - torch
 ---
+# ROBERTA BASE (cased) trained on private Bulgarian-English parallel data
+This is a Multilingual Roberta model. It could be used for creating embeddings of Bulgarian sentences.
+Using the ideas from [Sentence-BERT](https://arxiv.org/abs/2004.09813), the training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence.
 This model is cased: it does make a difference between bulgarian and Bulgarian.
+It was trained on private Bulgarian-English parallel data.
 Then, it was compressed via [progressive module replacing](https://arxiv.org/abs/2002.02925).