hf-test
/

xls-r-300m-sv

Automatic Speech Recognition

Generated from Trainer

hf-asr-leaderboard

mozilla-foundation/common_voice_7_0

robust-speech-event

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

hf-test commited on Jan 10, 2022

Commit

488c40e

•

1 Parent(s): 3ab2a05

Update README.md

Files changed (1) hide show

README.md +28 -3

README.md CHANGED Viewed

@@ -101,11 +101,36 @@ The following hyperparameters were used during training:
 ### Inference Without Decoder
 ### Inference With Decoder
-### Eval results (run `./eval.py`) on Common Voice 7 "test":
-**Without LM**: 27.28 WER
-**With LM**:

 ### Inference Without Decoder
+```python
+import torch
+from datasets import load_dataset
+from transformers import AutoModelForCTC, AutoProcessor
+import torchaudio.functional as F
+model_id = "patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm"
+sample = next(iter(load_dataset("common_voice", "es", split="test", streaming=True)))
+resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).numpy()
+model = AutoModelForCTC.from_pretrained(model_id)
+processor = AutoProcessor.from_pretrained(model_id)
+input_values = processor(resampled_audio, return_tensors="pt").input_values
+with torch.no_grad():
+    logits = model(input_values).logits
+-prediction_ids = torch.argmax(logits, dim=-1)
+-transcription = processor.batch_decode(prediction_ids)
++transcription = processor.batch_decode(logits.numpy()).text
+```
 ### Inference With Decoder
+### Eval results on Common Voice 7 "test":
+**Without LM**: 27.30 WER
+**With LM (run `./eval.py`)**: