Finetuend `xlm-roberta-base` model on Thai sequence and token classification datasets

Finetuned XLM Roberta BASE model on Thai sequence and token classification datasets The script and documentation can be found at this repository.

Model description

We use the pretrained cross-lingual RoBERTa model as proposed by [Conneau et al., 2020]. We download the pretrained PyTorch model via HuggingFace's Model Hub (https://huggingface.co/xlm-roberta-base)

Intended uses & limitations

You can use the finetuned models for multiclass/multilabel text classification and token classification task.

Multiclass text classification

wisesight_sentiment

4-class text classification task (positive, neutral, negative, and question) based on social media posts and tweets.
wongnai_reivews

Users' review rating classification task (scale is ranging from 1 to 5)
generated_reviews_enth : (review_star as label)

Generated users' review rating classification task (scale is ranging from 1 to 5).

Multilabel text classification

prachathai67k

Thai topic classification with 12 labels based on news article corpus from prachathai.com. The detail is described in this page.

Token classification

thainer

Named-entity recognition tagging with 13 named-entities as descibed in this page.
lst20 : NER NER and POS tagging

Named-entity recognition tagging with 10 named-entities and Part-of-Speech tagging with 16 tags as descibed in this page.

How to use

The example notebook demonstrating how to use finetuned model for inference can be found at this Colab notebook

BibTeX entry and citation info

@misc{lowphansirikul2021wangchanberta,
      title={WangchanBERTa: Pretraining transformer-based Thai Language Models}, 
      author={Lalita Lowphansirikul and Charin Polpanumas and Nawat Jantrakulchai and Sarana Nutanong},
      year={2021},
      eprint={2101.09635},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Finetuend xlm-roberta-base model on Thai sequence and token classification datasets

Model description

Intended uses & limitations

How to use

Finetuend `xlm-roberta-base` model on Thai sequence and token classification datasets