Update README.md
Browse files
README.md
CHANGED
@@ -111,6 +111,35 @@ This model was pre-trained on 4.5M hours of unlabeled audio data covering more t
|
|
111 |
|
112 |
**This model and its training are supported by 🤗 Transformers, more on it in the [docs](https://huggingface.co/docs/transformers/main/en/model_doc/wav2vec2-bert).**
|
113 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
114 |
# Seamless Communication usage
|
115 |
|
116 |
This model can be used in [Seamless Communication](https://github.com/facebookresearch/seamless_communication), where it was released.
|
|
|
111 |
|
112 |
**This model and its training are supported by 🤗 Transformers, more on it in the [docs](https://huggingface.co/docs/transformers/main/en/model_doc/wav2vec2-bert).**
|
113 |
|
114 |
+
|
115 |
+
# 🤗 Transformers usage
|
116 |
+
|
117 |
+
This is a bare checkpoint without any modeling head, and thus requires finetuning to be used for downstream tasks such as ASR. You can however use it to extract audio embeddings from the top layer with this code snippet:
|
118 |
+
|
119 |
+
```python
|
120 |
+
from transformers import AutoFeatureExtractor, Wav2Vec2BertModel
|
121 |
+
import torch
|
122 |
+
from datasets import load_dataset
|
123 |
+
|
124 |
+
dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
|
125 |
+
dataset = dataset.sort("id")
|
126 |
+
sampling_rate = dataset.features["audio"].sampling_rate
|
127 |
+
|
128 |
+
processor = AutoProcessor.from_pretrained("facebook/w2v-bert-2.0")
|
129 |
+
model = Wav2Vec2BertModel.from_pretrained("facebook/w2v-bert-2.0")
|
130 |
+
|
131 |
+
# audio file is decoded on the fly
|
132 |
+
inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
|
133 |
+
with torch.no_grad():
|
134 |
+
outputs = model(**inputs)
|
135 |
+
```
|
136 |
+
|
137 |
+
To learn more about the model use, refer to the following resources:
|
138 |
+
- [its docs](https://huggingface.co/docs/transformers/main/en/model_doc/wav2vec2-bert)
|
139 |
+
- [a blog post showing how to fine-tune it on Mongolian ASR](https://huggingface.co/blog/fine-tune-w2v2-bert)
|
140 |
+
- [a training script example](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py)
|
141 |
+
|
142 |
+
|
143 |
# Seamless Communication usage
|
144 |
|
145 |
This model can be used in [Seamless Communication](https://github.com/facebookresearch/seamless_communication), where it was released.
|