File size: 1,234 Bytes
3b9b315
 
 
 
 
e5596d9
5b2f1ae
 
 
 
 
 
 
 
 
 
 
 
e5596d9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b9b315
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---
pipeline_tag: fill-mask
---

## XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

converted checkpoint of XLM-V from fairseq to huggingface 

## Fairseq

if original model is needed, please check, model checkpoint:
```
https://dl.fbaipublicfiles.com/fairseq/xlmv/xlmv.base.tar.gz
```
and how to use it
```
https://github.com/facebookresearch/fairseq/blob/main/examples/xlmr/README.md
```

**Note: please use official checkpoints, if they will be added to transformers** (this repo is for personal usage/experiments)

Citation
--------
```
@misc{https://doi.org/10.48550/arxiv.2301.10472,
  doi = {10.48550/ARXIV.2301.10472},
  url = {https://arxiv.org/abs/2301.10472},
  author = {Liang, Davis and Gonen, Hila and Mao, Yuning and Hou, Rui and Goyal, Naman and Ghazvininejad, Marjan and Zettlemoyer, Luke and Khabsa, Madian},
  keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models},
  publisher = {arXiv},
  year = {2023},
  copyright = {Creative Commons Attribution Share Alike 4.0 International}
}
```