Edit model card

Pathumma Whisper Large V3 (TH)

Model Description

Additional information is needed

Quickstart

You can transcribe audio files using the pipeline class with the following code snippet:

import torch
from transformers import pipeline

device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32

lang = "th"
task = "transcribe"

pipe = pipeline(
    task="automatic-speech-recognition",
    model="nectec/Pathumma-whisper-th-large-v3",
    torch_dtype=torch_dtype,
    device=device,
)
pipe.model.config.forced_decoder_ids = pipe.tokenizer.get_decoder_prompt_ids(language=lang, task=task)

text = pipe("audio_path.wav")["text"]
print(text)

Evaluation Performance

WER calculated with newmm tokenizer for Thai word segmentation.

Model CV18 (WER) Gowejee (WER) LOTUS-TRD (WER) Thai Dialect (WER) Elderly (WER) Gigaspeech2 (WER) Fleurs (WER) Distant Meeting (WER) Podcast (WER)
whisper-large-v3 18.75 46.59 48.14 57.82 12.27 33.26 24.08 72.57 41.24
airesearch-wav2vec2-large-xlsr-53-th 8.49 17.28 63.01 48.53 11.29 52.72 37.32 85.11 65.12
thonburian-whisper-th-large-v3-combined 7.62 22.06 41.95 26.53 1.63 25.22 13.90 64.68 32.42
monsoon-whisper-medium-gigaspeech2 11.66 20.50 41.04 42.06 7.57 21.40 21.54 51.65 38.89
pathumma-whisper-th-large-v3 8.68 9.84 15.47 19.85 1.53 21.66 15.65 51.56 36.47

Note: Other models not target fine-tuned on dialect datasets may be less representative of dialect performance.

Limitations and Future Work

Additional information is needed

Acknowledgements

We extend our appreciation to the research teams engaged in the creation of the open speech model, including AIResearch, BiodatLab, Looloo Technology, SCB 10X, and OpenAI. We would like to express our gratitude to Dr. Titipat Achakulwisut of BiodatLab for the evaluation pipeline. We express our gratitude to ThaiSC, or NSTDA Supercomputer Centre, for supplying the LANTA used for model training, fine-tuning, and evaluation.

Pathumma Audio Team

Pattara Tipaksorn, Wayupuk Sommuang, Oatsada Chatthong, Kwanchiva Thangthai

Citation

@misc{tipaksorn2024PathummaWhisper,
    title        = { {Pathumma Whisper Large V3 (TH)} },
    author       = { Pattara Tipaksorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
    url          = { https://huggingface.co/nectec/Pathumma-whisper-th-large-v3 },
    publisher    = { Hugging Face },
    year         = { 2024 },
}
Downloads last month
402
Safetensors
Model size
1.54B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nectec/Pathumma-whisper-th-large-v3

Finetuned
(300)
this model