ivangtorre commited on
Commit
d62e767
1 Parent(s): 0299e04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -12
README.md CHANGED
@@ -4,7 +4,6 @@ language:
4
  - qu
5
  metrics:
6
  - cer
7
- - wer
8
  pipeline_tag: automatic-speech-recognition
9
  datasets:
10
  - ivangtorre/second_americas_nlp_2022
@@ -27,13 +26,15 @@ model-index:
27
  metrics:
28
  - name: Test CER
29
  type: cer
30
- value: 11.11
31
- - name: Test WER
32
- type: wer
33
- value: 11.11
34
  ---
35
 
36
- ## Usage
 
 
 
 
37
 
38
  The model can be used directly (without a language model) as follows:
39
 
@@ -46,11 +47,16 @@ import torchaudio
46
  processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
47
  model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
48
 
49
- # load dummy dataset and read soundfiles
50
- file = torchaudio.load("quechua000573.wav")
51
 
52
- # retrieve logits
53
- logits = model(file[0]).logits
 
 
 
 
 
54
 
55
  # take argmax and decode
56
  predicted_ids = torch.argmax(logits, dim=-1)
@@ -87,8 +93,6 @@ def map_to_pred(batch):
87
  result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1)
88
 
89
  print("CER:", cer(result["source_processed"], result["transcription"]))
90
- print("WER:", cer(result["source_processed"], result["transcription"]))
91
-
92
  ```
93
 
94
  ## Citation
 
4
  - qu
5
  metrics:
6
  - cer
 
7
  pipeline_tag: automatic-speech-recognition
8
  datasets:
9
  - ivangtorre/second_americas_nlp_2022
 
26
  metrics:
27
  - name: Test CER
28
  type: cer
29
+ value: 16.02
30
+
 
 
31
  ---
32
 
33
+ This model was finetuned from a Wav2vec2.0 XLS-R model: 300M with the Quechua train parition of the Americas NLP 2022 dataset. This challenge took place during NeurIPSS 2022.
34
+
35
+
36
+
37
+ ## Example of usage
38
 
39
  The model can be used directly (without a language model) as follows:
40
 
 
47
  processor = Wav2Vec2Processor.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
48
  model = Wav2Vec2ForCTC.from_pretrained("ivangtorre/wav2vec2-xlsr-300m-quechua")
49
 
50
+ # Pat to wav file
51
+ pathfile = "/path/to/wavfile"
52
 
53
+ # Load and normalize the file
54
+ wav, curr_sample_rate = sf.read(pathfile, dtype="float32")
55
+ feats = torch.from_numpy(wav).float()
56
+ with torch.no_grad():
57
+ feats = F.layer_norm(feats, feats.shape)
58
+ feats = torch.unsqueeze(feats, 0)
59
+ logits = model(feats).logits
60
 
61
  # take argmax and decode
62
  predicted_ids = torch.argmax(logits, dim=-1)
 
93
  result = librispeech_eval.map(map_to_pred, batched=True, batch_size=1)
94
 
95
  print("CER:", cer(result["source_processed"], result["transcription"]))
 
 
96
  ```
97
 
98
  ## Citation