Update README.md
Browse files
README.md
CHANGED
@@ -101,11 +101,36 @@ The following hyperparameters were used during training:
|
|
101 |
|
102 |
### Inference Without Decoder
|
103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
104 |
|
105 |
### Inference With Decoder
|
106 |
|
107 |
|
108 |
-
### Eval results
|
109 |
|
110 |
-
**Without LM**: 27.
|
111 |
-
**With LM**:
|
|
|
101 |
|
102 |
### Inference Without Decoder
|
103 |
|
104 |
+
```python
|
105 |
+
import torch
|
106 |
+
from datasets import load_dataset
|
107 |
+
from transformers import AutoModelForCTC, AutoProcessor
|
108 |
+
import torchaudio.functional as F
|
109 |
+
|
110 |
+
|
111 |
+
model_id = "patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm"
|
112 |
+
|
113 |
+
sample = next(iter(load_dataset("common_voice", "es", split="test", streaming=True)))
|
114 |
+
resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
|
115 |
+
|
116 |
+
model = AutoModelForCTC.from_pretrained(model_id)
|
117 |
+
processor = AutoProcessor.from_pretrained(model_id)
|
118 |
+
|
119 |
+
input_values = processor(resampled_audio, return_tensors="pt").input_values
|
120 |
+
|
121 |
+
with torch.no_grad():
|
122 |
+
logits = model(input_values).logits
|
123 |
+
|
124 |
+
-prediction_ids = torch.argmax(logits, dim=-1)
|
125 |
+
-transcription = processor.batch_decode(prediction_ids)
|
126 |
+
+transcription = processor.batch_decode(logits.numpy()).text
|
127 |
+
```
|
128 |
+
|
129 |
|
130 |
### Inference With Decoder
|
131 |
|
132 |
|
133 |
+
### Eval results on Common Voice 7 "test":
|
134 |
|
135 |
+
**Without LM**: 27.30 WER
|
136 |
+
**With LM (run `./eval.py`)**:
|