Finetuend xlm-roberta-base
model on Thai sequence and token classification datasets
Finetuned XLM Roberta BASE model on Thai sequence and token classification datasets The script and documentation can be found at this repository.
Model description
We use the pretrained cross-lingual RoBERTa model as proposed by [Conneau et al., 2020]. We download the pretrained PyTorch model via HuggingFace's Model Hub (https://huggingface.co/xlm-roberta-base)
Intended uses & limitations
You can use the finetuned models for multiclass/multilabel text classification and token classification task.
Multiclass text classification
wisesight_sentiment
4-class text classification task (
positive
,neutral
,negative
, andquestion
) based on social media posts and tweets.wongnai_reivews
Users' review rating classification task (scale is ranging from 1 to 5)
generated_reviews_enth
: (review_star
as label)Generated users' review rating classification task (scale is ranging from 1 to 5).
Multilabel text classification
prachathai67k
Thai topic classification with 12 labels based on news article corpus from prachathai.com. The detail is described in this page.
Token classification
thainer
Named-entity recognition tagging with 13 named-entities as descibed in this page.
lst20
: NER NER and POS taggingNamed-entity recognition tagging with 10 named-entities and Part-of-Speech tagging with 16 tags as descibed in this page.
How to use
The example notebook demonstrating how to use finetuned model for inference can be found at this Colab notebook
BibTeX entry and citation info
@misc{lowphansirikul2021wangchanberta,
title={WangchanBERTa: Pretraining transformer-based Thai Language Models},
author={Lalita Lowphansirikul and Charin Polpanumas and Nawat Jantrakulchai and Sarana Nutanong},
year={2021},
eprint={2101.09635},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 10