File size: 2,289 Bytes

0de2559
 
 
54c712d
174cbb5
 
0de2559
 
 
 
 
 
 
5e696d0
0de2559
 
5e696d0
 
 
0de2559
 
5e696d0
0de2559
5ee7e29
0de2559
5e696d0
 
 
0de2559
5e696d0
0de2559
5e696d0
0de2559
5e696d0
0de2559
5e696d0
 
0de2559
 
 
 
5ee7e29
 
0de2559
 
 
5ee7e29
 
 
 
0de2559
 
 
 
5ee7e29
0de2559

---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
widget:
- example_title: Sample Iban audio
  src: ibf_003_014.wav
---

# Whisper Small for Bahasa Iban - Meisin Lee

<!-- Provide a quick summary of what the model is/does. -->

This model is a fine-tuned version of openai/whisper-small on the [Iban Speech Corpus](https://huggingface.co/datasets/meisin123/iban_speech_corpus). 
More specifically, this Iban ASR is fine-tuned from the **most similar** language, in this case Malay is used. 
It achieves the following results on the evaluation set:

- Loss: 0.257025
- Wer Ortho: 0.158626
- Wer: 0.158781


## How to Get Started with the Model

Use the code below to use the model in **Inference Mode**.

```
from transformers import pipeline
import torch

device = "cuda:0" if torch.cuda.is_available() else "cpu"

pipe = pipeline("automatic-speech-recognition", model="meisin123/whisper-small-iban", chunk_length_s=30, device=device,)

audio_file = "audio.mp3"   ## use your own audio here

transcribed_text = pipe(audio_file, batch_size = 16)
```

## Training Details

### Training Data
The model is trained on the Iban Speech Corpus. The dataset is available on Huggingface, more information [here](https://huggingface.co/datasets/meisin123/iban_speech_corpus). 
Iban is one of the under-resourced languages. The Iban language (jaku Iban) is spoken by the Iban, one of the Dayak ethnic groups, who live in Brunei, the Indonesian province of West Kalimantan and in the Malaysian state of Sarawak. It belongs to the Malayic subgroup, a Malayo-Polynesian branch of the Austronesian language family.

## Evaluation

### Performance and Limitations
There are still a lot of room for improvement for this Iban ASR model. 
1. The accuracy of the model can be further improved with more training data. As Iban is an under-resourced languages, there are limited audio data to train on. 
2. Currently, the model is not able to handle code-switched speech. If the audio contains a combination of English and Iban, the model does poorly on the English portion. 


## Model Card Contact

For more information, please contact the author at meisin123@gmail.com