mms
vineelpratap commited on
Commit
92a7c8a
1 Parent(s): a0e83b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -0
README.md CHANGED
@@ -1,3 +1,163 @@
1
  ---
2
  license: cc-by-nc-4.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
+ tags:
4
+ - mms
5
  ---
6
+
7
+ # Massively Multilingual Speech (MMS) - Common Crawl Language Models
8
+
9
+ This repository consists of the n-gram language models trained on Common Crawl data ([Conneau et al. 2020b](https://aclanthology.org/2020.acl-main.747/), [NLLB_Team et al. 2022](https://arxiv.org/abs/2207.04672)) using [KenLM library](https://github.com/kpu/kenlm).
10
+
11
+ ## Table Of Content
12
+
13
+ - [Example](#example)
14
+ - [Supported Languages](#supported-languages)
15
+ - [Model details](#model-details)
16
+ - [Additional links](#additional-links)
17
+
18
+ ## Example
19
+
20
+ ```py
21
+
22
+ TODO
23
+ ```
24
+
25
+ ## Supported Languages
26
+
27
+ We support language models in 102 languages. Unclick the following to toogle all supported languages of this checkpoint in [ISO 639-3 code](https://en.wikipedia.org/wiki/ISO_639-3).
28
+ You can find more details about the languages and their ISO 639-3 codes in the [MMS Language Coverage Overview](https://dl.fbaipublicfiles.com/mms/misc/language_coverage_mms.html).
29
+ <details>
30
+ <summary>Click to toggle</summary>
31
+
32
+ - afr
33
+ - amh
34
+ - ara
35
+ - asm
36
+ - ast
37
+ - azj
38
+ - bel
39
+ - ben
40
+ - bos
41
+ - bul
42
+ - cat
43
+ - ceb
44
+ - ces
45
+ - ckb
46
+ - cmn
47
+ - cym
48
+ - dan
49
+ - deu
50
+ - ell
51
+ - eng
52
+ - est
53
+ - fas
54
+ - fin
55
+ - fra
56
+ - ful
57
+ - gle
58
+ - glg
59
+ - guj
60
+ - hau
61
+ - heb
62
+ - hin
63
+ - hrv
64
+ - hun
65
+ - hye
66
+ - ibo
67
+ - ind
68
+ - isl
69
+ - ita
70
+ - jav
71
+ - jpn
72
+ - kam
73
+ - kan
74
+ - kat
75
+ - kaz
76
+ - kea
77
+ - khm
78
+ - kir
79
+ - kor
80
+ - lao
81
+ - lav
82
+ - lin
83
+ - lit
84
+ - ltz
85
+ - lug
86
+ - luo
87
+ - mal
88
+ - mar
89
+ - mkd
90
+ - mlt
91
+ - mon
92
+ - mri
93
+ - mya
94
+ - nld
95
+ - nob
96
+ - npi
97
+ - nso
98
+ - nya
99
+ - oci
100
+ - orm
101
+ - ory
102
+ - pan
103
+ - pol
104
+ - por
105
+ - pus
106
+ - ron
107
+ - rus
108
+ - slk
109
+ - slv
110
+ - sna
111
+ - snd
112
+ - som
113
+ - spa
114
+ - srp
115
+ - swe
116
+ - swh
117
+ - tam
118
+ - tel
119
+ - tgk
120
+ - tgl
121
+ - tha
122
+ - tur
123
+ - ukr
124
+ - umb
125
+ - urd
126
+ - uzb
127
+ - vie
128
+ - wol
129
+ - xho
130
+ - yor
131
+ - yue
132
+ - zlm
133
+ - zul
134
+ </details>
135
+
136
+ ## Model details
137
+
138
+ - **Developed by:** Vineel Pratap et al.
139
+ - **Model type:** Multi-Lingual Automatic Speech Recognition model
140
+ - **Language(s):** 126 languages, see [supported languages](#supported-languages)
141
+ - **License:** CC-BY-NC 4.0 license
142
+ - **Num parameters**: 1 billion
143
+ - **Audio sampling rate**: 16,000 kHz
144
+ - **Cite as:**
145
+
146
+ @article{pratap2023mms,
147
+ title={Scaling Speech Technology to 1,000+ Languages},
148
+ author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},
149
+ journal={arXiv},
150
+ year={2023}
151
+ }
152
+
153
+ ## Additional Links
154
+
155
+ - [Blog post](https://ai.facebook.com/blog/multilingual-model-speech-recognition/)
156
+ - [Transformers documentation](https://huggingface.co/docs/transformers/main/en/model_doc/mms).
157
+ - [Paper](https://arxiv.org/abs/2305.13516)
158
+ - [GitHub Repository](https://github.com/facebookresearch/fairseq/tree/main/examples/mms#asr)
159
+ - [Other **MMS** checkpoints](https://huggingface.co/models?other=mms)
160
+ - MMS base checkpoints:
161
+ - [facebook/mms-1b](https://huggingface.co/facebook/mms-1b)
162
+ - [facebook/mms-300m](https://huggingface.co/facebook/mms-300m)
163
+ - [Official Space](https://huggingface.co/spaces/facebook/MMS)