--- license: cc-by-sa-4.0 language: - hr - bs - sr --- # XLM-R-BERTić This model was produced by pre-training [XLM-Roberta-large](https://huggingface.co/xlm-roberta-large) 48k steps on South Slavic languages. # Benchmarking Three tasks were chosen for model evaluation: * Named Entity Recognition (NER) * Sentiment regression * COPA (Choice of plausible alternatives) In all cases, this model was finetuned for specific downstream tasks. ## NER (entry to be added soon) ## Sentiment regression [ParlaSent dataset](https://huggingface.co/datasets/classla/ParlaSent) was used to evaluate sentiment regression for Bosnian, Croatian, and Serbian languages. The procedure is explained in greater detail in the dedicated [benchmarking repository](https://github.com/clarinsi/benchich/tree/main/sentiment). | system | train | test | r^2 | |:-----------------------------------------------------------------------|:--------------------|:-------------------------|------:| | [xlm-r-parlasent](https://huggingface.co/classla/xlm-r-parlasent) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.615 | | [BERTić](https://huggingface.co/classla/bcms-bertic) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.612 | | XLM-R-SloBERTić | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.607 | | XLM-Roberta-Large | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.605 | | **XLM-R-BERTić** | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.601 | | [crosloengual-bert](https://huggingface.co/EMBEDDIA/crosloengual-bert) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.537 | | XLM-Roberta-Base | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | 0.500 | | dummy (mean) | ParlaSent_BCS.jsonl | ParlaSent_BCS_test.jsonl | -0.12 | ## COPA (to be added soon) # Citation (to be added soon) # Authors * [Nikola Ljubešič](https://huggingface.co/nljubesi)