alvanlii commited on
Commit
dfb8dc8
·
1 Parent(s): 91e9ad0

Add more instructions

Browse files
Files changed (1) hide show
  1. README.md +19 -10
README.md CHANGED
@@ -34,7 +34,7 @@ This model is a fine-tuned version of [openai/whisper-large-v2](https://huggingf
34
  To use the model, use the following code. It should be able to inference with less than 16GB VRAM.
35
  ```
36
  from peft import PeftModel, PeftConfig
37
- from transformers import WhisperForConditionalGeneration, Seq2SeqTrainer
38
 
39
  peft_model_id = "alvanlii/whisper-largev2-cantonese-peft-lora"
40
  peft_config = PeftConfig.from_pretrained(peft_model_id)
@@ -42,6 +42,16 @@ model = WhisperForConditionalGeneration.from_pretrained(
42
  peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto"
43
  )
44
  model = PeftModel.from_pretrained(model, peft_model_id)
 
 
 
 
 
 
 
 
 
 
45
  ```
46
 
47
  ## Training and evaluation data
@@ -64,12 +74,11 @@ For training, three datasets were used:
64
 
65
  ## Training Results
66
 
67
- | Training Loss | Epoch | Step | Validation Loss | Normalized CER |
68
- |:-------------:|:-----:|:----:|:---------------:|:------:|
69
- | <TBA> | 0.55 | 2000 | <TBA> | <TBA> |
70
- | <TBA> | 1.11 | 4000 | <TBA> | <TBA> |
71
- | <TBA> | 1.66 | 6000 | <TBA> | <TBA> |
72
- | <TBA> | 2.22 | 8000 | <TBA> | <TBA> |
73
- | <TBA> | 2.77 | 10000 | <TBA> | <TBA> |
74
- | <TBA> | 3.32 | 12000 | <TBA> | <TBA> |
75
- | <TBA> | 3.88 | 14000 | <TBA> | <TBA> |
 
34
  To use the model, use the following code. It should be able to inference with less than 16GB VRAM.
35
  ```
36
  from peft import PeftModel, PeftConfig
37
+ from transformers import WhisperForConditionalGeneration, Seq2SeqTrainer, WhisperTokenizer, WhisperProcessor
38
 
39
  peft_model_id = "alvanlii/whisper-largev2-cantonese-peft-lora"
40
  peft_config = PeftConfig.from_pretrained(peft_model_id)
 
42
  peft_config.base_model_name_or_path, load_in_8bit=True, device_map="auto"
43
  )
44
  model = PeftModel.from_pretrained(model, peft_model_id)
45
+
46
+ task = "transcribe"
47
+ tokenizer = WhisperTokenizer.from_pretrained(peft_config.base_model_name_or_path, task=task)
48
+ processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, task=task)
49
+ feature_extractor = processor.feature_extractor
50
+ forced_decoder_ids = processor.get_decoder_prompt_ids(language=language, task=task)
51
+ pipe = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=tokenizer, feature_extractor=feature_extractor)
52
+
53
+ audio = # load audio here
54
+ text = pipe(audio, generate_kwargs={"forced_decoder_ids": forced_decoder_ids}, max_new_tokens=255)["text"]
55
  ```
56
 
57
  ## Training and evaluation data
 
74
 
75
  ## Training Results
76
 
77
+ | Training Loss | Epoch | Step | Validation Loss |
78
+ |:-------------:|:-----:|:----:|:---------------:|
79
+ | <TBA> | 0.55 | 2000 | <TBA> |
80
+ | <TBA> | 1.11 | 4000 | <TBA> |
81
+ | <TBA> | 1.66 | 6000 | <TBA> |
82
+ | <TBA> | 2.22 | 8000 | <TBA> |
83
+ | <TBA> | 2.77 | 10000 | <TBA> |
84
+ | <TBA> | 3.32 | 12000 | <TBA> |