File size: 1,381 Bytes
532b969
 
 
 
 
 
 
 
 
 
1a74094
 
5a09e12
a6d7e0f
21b753b
 
 
a6d7e0f
21b753b
a6d7e0f
21b753b
a6d7e0f
 
 
7c6a4aa
 
 
 
 
a6d7e0f
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
language: 
- zh
- bo
- kk
- ko
- mn
- ug
- yue
license: "apache-2.0"
---

## CINO: Pre-trained Language Models for Chinese Minority Languages

Multilingual Pre-trained Language Model, such as mBERT, XLM-R, provide multilingual and cross-lingual ability for language understanding.
We have seen rapid progress on building multilingual PLMs in recent year.
However, there is a lack of contributions on building PLMs on Chines minority languages, which hinders researchers from building powerful NLP systems.

To address the absence of Chinese minority PLMs, Joint Laboratory of HIT and iFLYTEK Research (HFL) proposes CINO (Chinese-miNOrity pre-trained language model), which is built on XLM-R with additional pre-training using Chinese minority corpus, such as Tibetan, Mongolian (Uighur form), Uyghur, Kazakh (Arabic form), Korean, Zhuang, Cantonese, etc.

Please read our GitHub repository for more details (Chinese): https://github.com/ymcui/Chinese-Minority-PLM

You may also interested in,

Chinese MacBERT: https://github.com/ymcui/MacBERT  
Chinese BERT series: https://github.com/ymcui/Chinese-BERT-wwm  
Chinese ELECTRA: https://github.com/ymcui/Chinese-ELECTRA  
Chinese XLNet: https://github.com/ymcui/Chinese-XLNet  
Knowledge Distillation Toolkit - TextBrewer: https://github.com/airaria/TextBrewer  

More resources by HFL: https://github.com/ymcui/HFL-Anthology