alvanlii
/

whisper-largev2-cantonese-peft-lora

Automatic Speech Recognition

Model card Files Files and versions Metrics Training metrics Community

alvanlii commited on Mar 8, 2023

Commit

dfb8dc8

·

1 Parent(s): 91e9ad0

Add more instructions

Files changed (1) hide show

README.md +19 -10

README.md CHANGED Viewed

@@ -34,7 +34,7 @@ This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingf
 To use the model, use the following code. It should be able to inference with less than 16GB VRAM.
 ```
 from peft import PeftModel, PeftConfig
-from transformers import WhisperForConditionalGeneration, Seq2SeqTrainer
 peft_model_id = "alvanlii/whisper-largev2-cantonese-peft-lora"
 peft_config = PeftConfig.from_pretrained(peft_model_id)
@@ -42,6 +42,16 @@ model = WhisperForConditionalGeneration.from_pretrained(
     peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto"
 )
 model = PeftModel.from_pretrained(model, peft_model_id)
 ```
 ## Training and evaluation data
@@ -64,12 +74,11 @@ For training, three datasets were used:
 ## Training Results
-| Training Loss | Epoch | Step | Validation Loss | Normalized CER    |
-|:-------------:|:-----:|:----:|:---------------:|:------:|
-| <TBA>        | 0.55  | 2000 | <TBA>          | <TBA> |
-| <TBA>        | 1.11  | 4000 | <TBA>          | <TBA> |
-| <TBA>        | 1.66  | 6000 | <TBA>          | <TBA> |
-| <TBA>        | 2.22  | 8000 | <TBA>          | <TBA> |
-| <TBA>        | 2.77  | 10000 | <TBA>          | <TBA> |
-| <TBA>        | 3.32  | 12000 | <TBA>          | <TBA> |
-| <TBA>        | 3.88  | 14000 | <TBA>          | <TBA> |

 To use the model, use the following code. It should be able to inference with less than 16GB VRAM.
 ```
 from peft import PeftModel, PeftConfig
+from transformers import WhisperForConditionalGeneration, Seq2SeqTrainer, WhisperTokenizer, WhisperProcessor
 peft_model_id = "alvanlii/whisper-largev2-cantonese-peft-lora"
 peft_config = PeftConfig.from_pretrained(peft_model_id)
     peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto"
 )
 model = PeftModel.from_pretrained(model, peft_model_id)
+task = "transcribe"
+tokenizer = WhisperTokenizer.from_pretrained(peft_config.base_model_name_or_path, task=task)
+processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, task=task)
+feature_extractor = processor.feature_extractor
+forced_decoder_ids = processor.get_decoder_prompt_ids(language=language, task=task)
+pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=tokenizer, feature_extractor=feature_extractor)
+audio = # load audio here
+text = pipe(audio, generate_kwargs={"forced_decoder_ids": forced_decoder_ids}, max_new_tokens=255)["text"]
 ```
 ## Training and evaluation data
 ## Training Results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| <TBA>        | 0.55  | 2000 | <TBA>          |
+| <TBA>        | 1.11  | 4000 | <TBA>          |
+| <TBA>        | 1.66  | 6000 | <TBA>          |
+| <TBA>        | 2.22  | 8000 | <TBA>          |
+| <TBA>        | 2.77  | 10000 | <TBA>          |
+| <TBA>        | 3.32  | 12000 | <TBA>          |