kornwtp/ConGen-paraphrase-multilingual-mpnet-base-v2

This is a ConGen model: It maps sentences to a 768 dimensional dense vector space and can be used for tasks like semantic search.

Usage

Using this model becomes easy when you have ConGen installed:

pip install -U git+https://github.com/KornWtp/ConGen.git

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["กลุ่มผู้ชายเล่นฟุตบอลบนชายหาด", "กลุ่มเด็กชายกำลังเล่นฟุตบอลบนชายหาด"]

model = SentenceTransformer('kornwtp/ConGen-paraphrase-multilingual-mpnet-base-v2')
embeddings = model.encode(sentences)
print(embeddings)

Evaluation Results

For an automated evaluation of this model, see the Thai Sentence Embeddings Benchmark: Semantic Textual Similarity

Citing & Authors

@inproceedings{limkonchotiwat-etal-2022-congen,
    title = "{ConGen}: Unsupervised Control and Generalization Distillation For Sentence Representation",
    author = "Limkonchotiwat, Peerat  and
      Ponwitayarat, Wuttikorn  and
      Lowphansirikul, Lalita and
      Udomcharoenchaikit, Can  and
      Chuangsuwanich, Ekapol  and
      Nutanong, Sarana",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022",
    year = "2022",
    publisher = "Association for Computational Linguistics",
}