projecte-aina
/

roberta-base-ca-cased-sts

Text Classification

semantic textual similarity

Catalan Textual Corpus

Inference Endpoints

Model card Files Files and versions Community

bsc-temu commited on Nov 26, 2021

Commit

c1c2dea

•

1 Parent(s): 7903afe

Update README.md

Files changed (1) hide show

README.md +95 -0

README.md CHANGED Viewed

	@@ -0,0 +1,95 @@

+---
+language:
+- ca
+license: ???
+tags:
+- "catalan"
+- "semantic textual similarity"
+- "sts-ca"
+- "CaText"
+- "Catalan Textual Corpus"
+datasets:
+- "projecte-aina/sts-ca"
+metrics:
+- "pearson"
+model-index:
+- name: roberta-base-ca-cased-sts
+  results:
+  - task:
+      type: text-classification  # Required. Example: automatic-speech-recognition
+    dataset:
+      type:   projecte-aina/sts-ca
+      name: sts-ca
+    metrics:
+      - type: pearson
+        value: 0.8120486139447483
+widget:
+- text: "M'agrades. T'estimo."
+- text: "M'agrada el sol i la calor. A la Garrotxa plou molt."
+- text: "El llibre va caure per la finestra. El llibre va sortir volant."
+- text: "El meu aniversari és el 23 de maig. Faré anys a finals de maig."
+---
+# Catalan BERTa (RoBERTa-base) finetuned for Textual Entailment.
+The **roberta-base-ca-cased-sts** is a Semantic Textual Similarity (STS) model for the Catalan language fine-tuned from the [BERTa](https://huggingface.co/PlanTL-GOB-ES/roberta-base-ca) model, a [RoBERTa](https://arxiv.org/abs/1907.11692) base model pre-trained on a medium-size corpus collected from publicly available corpora and crawlers (check the BERTa model card for more details).
+## Datasets
+We used the TE dataset in Catalan called [STS-ca](https://huggingface.co/datasets/projecte-aina/sts-ca) for training and evaluation.
+## Evaluation and results
+Below, the evaluation result on the STS-ca test set:
+| Task        | STS-ca (pearson) |
+| ------------|:----|
+| BERTa       | **81.20** |
+For more details, check the fine-tuning and evaluation scripts in the official [GitHub repository](https://github.com/projecte-aina/berta).
+## Citing
+If you use any of these resources (datasets or models) in your work, please cite our latest paper:
+```bibtex
+@inproceedings{armengol-estape-etal-2021-multilingual,
+    title = "Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? {A} Comprehensive Assessment for {C}atalan",
+    author = "Armengol-Estap{\'e}, Jordi  and
+      Carrino, Casimiro Pio  and
+      Rodriguez-Penagos, Carlos  and
+      de Gibert Bonet, Ona  and
+      Armentano-Oller, Carme  and
+      Gonzalez-Agirre, Aitor  and
+      Melero, Maite  and
+      Villegas, Marta",
+    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
+    month = aug,
+    year = "2021",
+    address = "Online",
+    publisher = "Association for Computational Linguistics",
+    url = "https://aclanthology.org/2021.findings-acl.437",
+    doi = "10.18653/v1/2021.findings-acl.437",
+    pages = "4933--4946",
+}
+```
+## Funding
+TODO
+## Disclaimer
+TODO