playwithmino/PhoWhisper-large-peft-VietMed

Model Details

Base model: vinai/PhoWhisper-large
This model is finetuned on VietMed training set, which reduces the WER on VietMed testset from 26,14 to 21,63.
To reproduce this finetuned model, you can use the same tokenizer and processor with vinai/PhoWhisper-large

Model Description

Finetuned by: Play-With-Mino
Model type: Whisper
Language(s) (NLP): Vietnamese
Finetuned from model: vinai/PhoWhisper-large

How to use

import torch
from transformers import pipeline, AutoModelForSpeechSeq2Seq, AutoProcessor

sampling_rate, audio_array = wavfile.read("path_to_your_wav_file")
audio_input = {
    "path" : "pth_to_your_wav_five",
    "array" : audio_array,
    "sampling_rate" : sampling_rate
}
device = "cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
processor = AutoProcessor.from_pretrained("vinai/PhoWhisper-large")
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    "playwithmino/PhoWhisper-large-peft-VietMed", 
    torch_dtype=torch_dtype, 
    low_cpu_mem_usage=True
)
model.to(device)
transcriber = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    chunk_length_s=30,
    torch_dtype=torch_dtype,
    device=device,
    batch_size=32
)
transcriptions = transcriber(audio_inputs)

Framework versions

PEFT 0.13.2
Transformers 4.36.0

playwithmino
/

PhoWhisper-large-peft-VietMed

Model Details

Model Description

How to use

Framework versions

Model tree for playwithmino/PhoWhisper-large-peft-VietMed