PhoBERT: Pre-trained language models for Vietnamese
Pre-trained PhoBERT models are the state-of-the-art language models for Vietnamese (Pho, i.e. "Phแป", is a popular food in Vietnam):
- Two PhoBERT versions of "base" and "large" are the first public large-scale monolingual language models pre-trained for Vietnamese. PhoBERT pre-training approach is based on RoBERTa which optimizes the BERT pre-training procedure for more robust performance.
- PhoBERT outperforms previous monolingual and multilingual approaches, obtaining new state-of-the-art performances on four downstream Vietnamese NLP tasks of Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference.
The general architecture and experimental results of PhoBERT can be found in our EMNLP-2020 Findings paper:
@article{phobert,
title = {{PhoBERT: Pre-trained language models for Vietnamese}},
author = {Dat Quoc Nguyen and Anh Tuan Nguyen},
journal = {Findings of EMNLP},
year = {2020}
}
Please CITE our paper when PhoBERT is used to help produce published results or is incorporated into other software.
For further information or requests, please go to PhoBERT's homepage!
- Downloads last month
- 7,129
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.