File size: 3,182 Bytes
dd9561c 2f23015 93df1ac 0299e04 437f470 ef424a0 0299e04 437f470 ef424a0 437f470 ef424a0 437f470 91b7359 d62e767 dd9561c d62e767 dd9561c 28f1581 dd9561c d62e767 dd9561c d62e767 dd9561c 60e858d 7044354 5016634 7044354 5016634 7044354 5016634 7044354 60e858d ded7488 60e858d 93df1ac |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
license: cc-by-4.0
metrics:
- cer
pipeline_tag: automatic-speech-recognition
datasets:
- ivangtorre/second_americas_nlp_2022
tags:
- audio
- automatic-speech-recognition
- speech
- quechua
- xlsr-fine-tuning
model-index:
- name: Wav2Vec2 XLSR 300M Quechua Model by M Romero and Ivan G Torre
results:
- task:
name: Speech Recognition
type: automatic-speech-recognition
dataset:
name: Americas NLP 2022 Quechua
type: second_americas_nlp_2022
args: Quechua
metrics:
- name: Test CER
type: cer
value: 49.2
---
This model was finetuned from a Wav2vec2.0 XLS-R model: 300M with the Quechua train parition of the Americas NLP 2022 dataset. This challenge took place during NeurIPSS 2022.
## Example of usage
The model can be used directly (without a language model) as follows:
```python
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import torchaudio
# load model and processor
processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
# Pat to wav file
pathfile = "/path/to/wavfile"
# Load and normalize the file
wav, curr_sample_rate = sf.read(pathfile, dtype="float32")
feats = torch.from_numpy(wav).float()
with torch.no_grad():
feats = F.layer_norm(feats, feats.shape)
feats = torch.unsqueeze(feats, 0)
logits = model(feats).logits
# take argmax and decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
print("HF prediction: ", transcription)
```
This code snipnet shows how to Evaluate the wav2vec2-xlsr-300m-quechua in [Second Americas NLP 2022 Quechua dev set](https://huggingface.co/datasets/ivangtorre/second_americas_nlp_2022)
```python
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torch
from jiwer import cer
import torch.nn.functional as F
from datasets import load_dataset
import soundfile as sf
americasnlp = load_dataset("ivangtorre/second_americas_nlp_2022", "quechua", split="dev")
quechua = americasnlp.filter(lambda language: language['subset']=='quechua')
model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
def map_to_pred(batch):
wav = batch["audio"][0]["array"]
feats = torch.from_numpy(wav).float()
feats = F.layer_norm(feats, feats.shape) # Normalization performed during finetuning
feats = torch.unsqueeze(feats, 0)
logits = model(feats).logits
predicted_ids = torch.argmax(logits, dim=-1)
batch["transcription"] = processor.batch_decode(predicted_ids)
return batch
result = quechua.map(map_to_pred, batched=True, batch_size=1)
print("CER:", cer(result["source_processed"], result["transcription"]))
```
## Citation
```bibtex
@article{romero2024asr,
title={ASR advancements for indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana},
author={Romero, Monica and Gomez, Sandra and Torre, Iv{\'a}n G},
journal={arXiv preprint arXiv:2404.08368},
year={2024}
}
``` |