Transformers
PyTorch
wav2vec2
pretraining
speech
xls_r
xls_r_pretrained
Inference Endpoints
patrickvonplaten commited on
Commit
618f1bb
1 Parent(s): 06ba4de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -13,8 +13,9 @@ license: apache-2.0
13
 
14
  ![model image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/xls_r.png)
15
 
 
16
 
17
- [Facebook's Wav2Vec2 XLS-R](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) containing **300 million** parameters.
18
 
19
  XLS-R is Facebook AI's large-scale multilingual pretrained model for speech (the "XLM-R for Speech"). It is pretrained on 436k hours of unlabeled speech, including VoxPopuli, MLS, CommonVoice, BABEL, and VoxLingua107. It uses the wav2vec 2.0 objective, in 128 languages. When using the model make sure that your speech input is sampled at 16kHz.
20
 
@@ -34,7 +35,7 @@ The original model can be found under https://github.com/pytorch/fairseq/tree/ma
34
  See [this google colab](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLS_R_on_Common_Voice.ipynb) for more information on how to fine-tune the model.
35
 
36
  You can find other pretrained XLS-R models with different numbers of parameters:
 
37
  * [300M parameters version](https://huggingface.co/facebook/wav2vec2-xls-r-300m)
38
  * [1B version version](https://huggingface.co/facebook/wav2vec2-xls-r-1b)
39
- * [2B version version](https://huggingface.co/facebook/wav2vec2-xls-r-2b)
40
-
 
13
 
14
  ![model image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/xls_r.png)
15
 
16
+ ![model image](https://raw.githubusercontent.com/patrickvonplaten/scientific_images/master/xls_r.png)
17
 
18
+ [Facebook's Wav2Vec2 XLS-R](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) counting **300 million** parameters.
19
 
20
  XLS-R is Facebook AI's large-scale multilingual pretrained model for speech (the "XLM-R for Speech"). It is pretrained on 436k hours of unlabeled speech, including VoxPopuli, MLS, CommonVoice, BABEL, and VoxLingua107. It uses the wav2vec 2.0 objective, in 128 languages. When using the model make sure that your speech input is sampled at 16kHz.
21
 
 
35
  See [this google colab](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLS_R_on_Common_Voice.ipynb) for more information on how to fine-tune the model.
36
 
37
  You can find other pretrained XLS-R models with different numbers of parameters:
38
+
39
  * [300M parameters version](https://huggingface.co/facebook/wav2vec2-xls-r-300m)
40
  * [1B version version](https://huggingface.co/facebook/wav2vec2-xls-r-1b)
41
+ * [2B version version](https://huggingface.co/facebook/wav2vec2-xls-r-2b)