TurkuNLP
/

xlmr-qa-register

Text Classification

Inference Endpoints

Model card Files Files and versions Community

annieske commited on Nov 2, 2023

Commit

c5ce7a5

•

1 Parent(s): 3d78161

Update README.md

Files changed (1) hide show

README.md +41 -0

README.md CHANGED Viewed

@@ -1,3 +1,44 @@
 ---
 license: cc-by-sa-4.0
 ---

 ---
 license: cc-by-sa-4.0
 ---
+### xlm-roberta-base for register labeling, specifically fine-tuned for question-answer document identification
+This is the `xlm-roberta-base`, fine-tuned on register annotated data in English (https://github.com/TurkuNLP/CORE-corpus) and Finnish (https://github.com/TurkuNLP/FinCORE_full) as well as unpublished versions of Swedish and French (https://github.com/TurkuNLP/multilingual-register-labeling). The model is trained to predict whether a text includes something related to questions and answers or not.
+### Overview
+Language model: xlm-roberta-base
+Downstream-task: multi-class text classification
+### Usage
+the model can be used through a huggingface pipeline:
+```
+model = transformers.AutoModelForSequenceClassification.from_pretrained("TurkuNLP/xlmr-qa-register")
+tokenizer = transformers.AutoTokenizer.from_pretrained("xlm-roberta-base")
+pipe = transformers.pipeline(task="text-classification", model=model, tokenizer=tokenizer)
+```
+### Hyperparameters
+```
+batch_size = 8
+epochs = 10 (trained for 4)
+base_LM_model = "xlm-roberta-base"
+max_seq_len = 512
+learning_rate = 4e-6
+```
+### Performance
+```
+F1-micro = 0.98
+F1-macro = 0.79
+F1 QA label = 0.60
+F1 not QA label = 0.99
+Precision QA label = 0.82
+Precision not QA label = 0.99
+Recall QA label = 0.47
+Recall not QA label = 1.00
+```