ylacombe commited on
Commit
f22c971
·
verified ·
1 Parent(s): a7dd3f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -111,6 +111,35 @@ This model was pre-trained on 4.5M hours of unlabeled audio data covering more t
111
 
112
  **This model and its training are supported by 🤗 Transformers, more on it in the [docs](https://huggingface.co/docs/transformers/main/en/model_doc/wav2vec2-bert).**
113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  # Seamless Communication usage
115
 
116
  This model can be used in [Seamless Communication](https://github.com/facebookresearch/seamless_communication), where it was released.
 
111
 
112
  **This model and its training are supported by 🤗 Transformers, more on it in the [docs](https://huggingface.co/docs/transformers/main/en/model_doc/wav2vec2-bert).**
113
 
114
+
115
+ # 🤗 Transformers usage
116
+
117
+ This is a bare checkpoint without any modeling head, and thus requires finetuning to be used for downstream tasks such as ASR. You can however use it to extract audio embeddings from the top layer with this code snippet:
118
+
119
+ ```python
120
+ from transformers import AutoFeatureExtractor, Wav2Vec2BertModel
121
+ import torch
122
+ from datasets import load_dataset
123
+
124
+ dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
125
+ dataset = dataset.sort("id")
126
+ sampling_rate = dataset.features["audio"].sampling_rate
127
+
128
+ processor = AutoProcessor.from_pretrained("facebook/w2v-bert-2.0")
129
+ model = Wav2Vec2BertModel.from_pretrained("facebook/w2v-bert-2.0")
130
+
131
+ # audio file is decoded on the fly
132
+ inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
133
+ with torch.no_grad():
134
+ outputs = model(**inputs)
135
+ ```
136
+
137
+ To learn more about the model use, refer to the following resources:
138
+ - [its docs](https://huggingface.co/docs/transformers/main/en/model_doc/wav2vec2-bert)
139
+ - [a blog post showing how to fine-tune it on Mongolian ASR](https://huggingface.co/blog/fine-tune-w2v2-bert)
140
+ - [a training script example](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py)
141
+
142
+
143
  # Seamless Communication usage
144
 
145
  This model can be used in [Seamless Communication](https://github.com/facebookresearch/seamless_communication), where it was released.