Automatic Speech Recognition
Safetensors
Xhosa
whisper
audio
wjbmattingly commited on
Commit
a3c0327
·
verified ·
1 Parent(s): a9d4da4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -1
README.md CHANGED
@@ -16,4 +16,74 @@ metrics:
16
  - wer
17
  base_model: openai/whisper-small
18
  license: apache-2.0
19
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  - wer
17
  base_model: openai/whisper-small
18
  license: apache-2.0
19
+ ---
20
+
21
+ ---
22
+ language:
23
+ - xh
24
+ pipeline_tag: automatic-speech-recognition
25
+ tags:
26
+ - audio
27
+ - automatic-speech-recognition
28
+ widget:
29
+ - example_title: Librispeech sample 1
30
+ src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
31
+ - example_title: Librispeech sample 2
32
+ src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
33
+ datasets:
34
+ - Beijuka/xhosa_parakeet_50hr
35
+ metrics:
36
+ - wer
37
+ base_model: openai/whisper-small
38
+ ---
39
+
40
+ # Whisper-Small Fine-tuned for isiXhosa ASR
41
+
42
+ ## Model Description
43
+ This model is a fine-tuned version of OpenAI's Whisper-small, optimized for isiXhosa Automatic Speech Recognition (ASR). It has been trained on the NCHLT isiXhosa Speech Corpus to improve its performance on isiXhosa speech transcription tasks.
44
+
45
+ ## Performance
46
+ - Word Error Rate (WER): 32%
47
+
48
+ ## Base Model
49
+ - Name: openai/whisper-small
50
+ - Type: Automatic Speech Recognition (ASR)
51
+ - Original language: Multilingual
52
+
53
+ ## Usage
54
+ To use this model for inference:
55
+
56
+ ```python
57
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor
58
+ import torch
59
+
60
+ # Load model and processor
61
+ model = WhisperForConditionalGeneration.from_pretrained("TheirStory-Inc/whisper-small-xhosa")
62
+ processor = WhisperProcessor.from_pretrained("TheirStory-Inc/whisper-small-xhosa")
63
+
64
+ # Prepare your audio file (16kHz sampling rate)
65
+ audio_input = ... # Load your audio file here
66
+
67
+ # Process the audio
68
+ input_features = processor(audio_input, sampling_rate=16000, return_tensors="pt").input_features
69
+
70
+ # Generate token ids
71
+ predicted_ids = model.generate(input_features)
72
+
73
+ # Decode the token ids to text
74
+ transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
75
+
76
+ print(transcription)
77
+ ```
78
+
79
+ ## Fine-tuning Dataset
80
+ - Name: NCHLT isiXhosa Speech Corpus
81
+ - Size: Approximately 56 hours of transcribed speech
82
+ - Speakers: 209 (106 female, 103 male)
83
+ - Content: Prompted speech (3-5 word utterances read from a smartphone screen)
84
+ - Source: Audio recordings smartphone-collected in non-studio environment
85
+ - License: Creative Commons Attribution 3.0 Unported License (CC BY 3.0)
86
+ -
87
+ ```bibtext
88
+ De Vries, N.J., Davel, M.H., Badenhorst, J., Basson, W.D., de Wet, F., Barnard, E. and de Waal, A. (2014). A smartphone-based ASR data collection tool for under-resourced languages. Speech Communication, 56, 119-131. https://hdl.handle.net/20.500.12185/279
89
+ ```