5roop commited on
Commit
f5b8a4e
1 Parent(s): cfce117

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -1
README.md CHANGED
@@ -1,3 +1,43 @@
1
  ---
2
- license: cc-by-nc-sa-4.0
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-sa-4.0
3
+ language:
4
+ - hr
5
+ - bs
6
+ - sr
7
  ---
8
+ # XLM-R-SloBertić
9
+
10
+ This model was produced by pre-training [XLM-Roberta-large](https://huggingface.co/xlm-roberta-large) 48k steps on South Slavic languages.
11
+
12
+ # Benchmarking
13
+ Three tasks were chosen for model evaluation:
14
+ * Named Entity Recognition (NER)
15
+ * Sentiment regression
16
+ * COPA (Choice of plausible alternatives)
17
+
18
+
19
+ In all cases, this model was finetuned for specific downstream tasks.
20
+ ## NER
21
+ (entry to be added soon)
22
+ ## Sentiment regression
23
+
24
+ [ParlaSent dataset](https://huggingface.co/datasets/classla/ParlaSent) was used to evaluate sentiment regression for Bosnian, Croatian, and Serbian languages.
25
+ The procedure is explained in greater detail in the dedicated [benchmarking repository](https://github.com/clarinsi/benchich/tree/main/sentiment).
26
+
27
+ | system | train | test | r^2 |
28
+ |:-----------------------------------------------------------------------|:--------------------|:-------------------------|------:|
29
+ | [xlm-r-parlasent](https://huggingface.co/classla/xlm-r-parlasent) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.615 |
30
+ | [BERTić](https://huggingface.co/classla/bcms-bertic) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.612 |
31
+ | XLM-R-SloBERTić | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.607 |
32
+ | XLM-Roberta-Large | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.605 |
33
+ | ** XLM-R-BERTić ** | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.601 |
34
+ | [crosloengual-bert](https://huggingface.co/EMBEDDIA/crosloengual-bert) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.537 |
35
+ | XLM-Roberta-Base | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.500 |
36
+ | dummy (mean) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | -0.12 |
37
+ ## COPA
38
+ (to be added soon)
39
+
40
+ # Citation
41
+ (to be added soon)
42
+ # Authors
43
+ * [Nikola Ljubešič](https://huggingface.co/nljubesi)