Model Card for Model ID
Japanese transcription, testing in progress to see results, main personal use cases are japanese comedy
usage 9GB vram with this Lora
Model Details
Model Description
openai-whisper-large-v2-LORA-ja
- Developed by: FZNX
- Model type: PEFT LORA
- Language(s) (NLP): Fine tune Japanese on whisper common 16
- License: [More Information Needed]
- Finetuned from model [optional]: Whisper Large V2
How to Get Started with the Model
import torch from transformers import ( AutomaticSpeechRecognitionPipeline, WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor, ) from peft import PeftModel, PeftConfig
peft_model_id = "fznx92/openai-whisper-large-v2-ja-transcribe-colab" sample = "insert mp3 file location here"
language = "japanese" task = "transcribe"
peft_config = PeftConfig.from_pretrained(peft_model_id) model = WhisperForConditionalGeneration.from_pretrained( peft_config.base_model_name_or_path, ) model = PeftModel.from_pretrained(model, peft_model_id) model.to("cuda").half()
processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)
pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=processor.tokenizer, feature_extractor=processor.feature_extractor, batch_size=8, torch_dtype=torch.float16, device="cuda:0")
def transcribe(audio, return_timestamps=False): text = pipe(audio, chunk_length_s=30, return_timestamps=return_timestamps, generate_kwargs={"language": language, "task": task})["text"] return text
transcript = transcribe(sample) print(transcript)
Training Data
Common Voice 16 dataset
Training Procedure
via Google Colab T5 @ 6 hours
Evaluation
Framework versions
- PEFT 0.7.1
- Downloads last month
- 3
Model tree for fznx92/openai-whisper-large-v2-ja-transcribe-colab
Base model
openai/whisper-large-v2