--- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/model-cards widget: - example_title: Sample Iban audio src: ibf_003_014.wav --- # Whisper Small for Bahasa Iban - Meisin Lee This model is a fine-tuned version of openai/whisper-small on the [Iban Speech Corpus](https://huggingface.co/datasets/meisin123/iban_speech_corpus). More specifically, this Iban ASR is fine-tuned from the **most similar** language, in this case Malay is used. It achieves the following results on the evaluation set: - Loss: 0.257025 - Wer Ortho: 0.158626 - Wer: 0.158781 ## How to Get Started with the Model Use the code below to use the model in **Inference Mode**. ``` from transformers import pipeline import torch device = "cuda:0" if torch.cuda.is_available() else "cpu" pipe = pipeline("automatic-speech-recognition", model="meisin123/whisper-small-iban", chunk_length_s=30, device=device,) audio_file = "audio.mp3" ## use your own audio here transcribed_text = pipe(audio_file, batch_size = 16) ``` ## Training Details ### Training Data The model is trained on the Iban Speech Corpus. The dataset is available on Huggingface, more information [here](https://huggingface.co/datasets/meisin123/iban_speech_corpus). Iban is one of the under-resourced languages. The Iban language (jaku Iban) is spoken by the Iban, one of the Dayak ethnic groups, who live in Brunei, the Indonesian province of West Kalimantan and in the Malaysian state of Sarawak. It belongs to the Malayic subgroup, a Malayo-Polynesian branch of the Austronesian language family. ## Evaluation ### Performance and Limitations There are still a lot of room for improvement for this Iban ASR model. 1. The accuracy of the model can be further improved with more training data. As Iban is an under-resourced languages, there are limited audio data to train on. 2. Currently, the model is not able to handle code-switched speech. If the audio contains a combination of English and Iban, the model does poorly on the English portion. ## Model Card Contact For more information, please contact the author at meisin123@gmail.com