Fill-Mask
Transformers
PyTorch
xlm-roberta
Inference Endpoints
luciusssss commited on
Commit
ca66086
1 Parent(s): 5bb6a34

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -3
README.md CHANGED
@@ -1,3 +1,31 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - pkupie/mc2_corpus
5
+ language:
6
+ - bo
7
+ - ug
8
+ - mn
9
+ - kk
10
+ ---
11
+ # [MC^2XLMR-large]
12
+ [Github Repo](https://github.com/luciusssss/mc2_corpus)
13
+
14
+
15
+ We continually pretrain XLM-RoBERTa-large with [MC^2](https://huggingface.co/datasets/pkupie/mc2_corpus), which supports Tibetan, Uyghur, Kazakh in the Kazakh Arabic script, and Mongolian in the traditional Mongolian script.
16
+
17
+
18
+ See details in the [paper](https://arxiv.org/abs/2311.08348).
19
+
20
+ ## Citation
21
+ ```
22
+ @misc{zhang2023mc2,
23
+ title={MC^2: A Multilingual Corpus of Minority Languages in China},
24
+ author={Chen Zhang and Mingxu Tao and Quzhe Huang and Jiuheng Lin and Zhibin Chen and Yansong Feng},
25
+ year={2023},
26
+ eprint={2311.08348},
27
+ archivePrefix={arXiv},
28
+ primaryClass={cs.CL}
29
+ }
30
+ ```
31
+