cahya
/

whisper-medium-id

@@ -26,21 +26,48 @@ model-index:
     - name: Wer
       type: wer
       value: 3.8273540533062804
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # Whisper Medium Indonesian
-This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the mozilla-foundation/common_voice_11_0, magic_data, titml id dataset.
-It achieves the following results on the evaluation set:
 - Loss: 0.0698
 - Wer: 3.8274
-## Model description
-More information needed
 ## Intended uses & limitations
@@ -80,7 +107,29 @@ The following hyperparameters were used during training:
 | 0.0122        | 2.98  | 9000  | 0.0714          | 3.9795 |
 | 0.0049        | 3.31  | 10000 | 0.0720          | 3.9887 |
 ### Framework versions
 - Transformers 4.26.0.dev0

     - name: Wer
       type: wer
       value: 3.8273540533062804
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: google/fleurs id_id
+      type: google/fleurs
+      config: id_id
+      split: test
+    metrics:
+    - name: Wer
+      type: wer
+      value: 9.74
 ---
 # Whisper Medium Indonesian
+This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the
+Indonesian mozilla-foundation/common_voice_11_0, magic_data, titml and google/fleurs dataset. It achieves the following
+results:
+### CV11 test split:
 - Loss: 0.0698
 - Wer: 3.8274
+### Google/fleurs test split:
+- Wer: 9.74
+## Usage
+```python
+from transformers import pipeline
+transcriber = pipeline(
+  "automatic-speech-recognition",
+  model="cahya/whisper-medium-id"
+)
+transcriber.model.config.forced_decoder_ids = (
+  transcriber.tokenizer.get_decoder_prompt_ids(
+    language="id"
+    task="transcribe"
+  )
+)
+transcription = transcriber("my_audio_file.mp3")
+```
 ## Intended uses & limitations
 | 0.0122        | 2.98  | 9000  | 0.0714          | 3.9795 |
 | 0.0049        | 3.31  | 10000 | 0.0720          | 3.9887 |
+## Evaluation
+We evaluated the model using the test split of two datasets, the [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0)
+and the [Google Fleurs](https://huggingface.co/datasets/google/fleurs).
+As Whisper can transcribe casing and punctuation, we also evaluate its performance using raw and normalized text.
+(lowercase + removal of punctuations). The results are as follows:
+### Common Voice 11
+|                                                                           | WER  |
+|---------------------------------------------------------------------------|------|
+| [cahya/whisper-medium-id](https://huggingface.co/cahya/whisper-medium-id) | 3.83 |
+| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)     | tbc  |
+### Google/Fleurs
+|                                                                                                             | WER  |
+|-------------------------------------------------------------------------------------------------------------|------|
+| [cahya/whisper-medium-id](https://huggingface.co/cahya/whisper-medium-id)                      | 9.74 |
+| [cahya/whisper-medium-id](https://huggingface.co/cahya/whisper-medium-id) + text normalization | tbc  |
+| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium)                                       | tbc  |
+| [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) + text normalization                  | tbc  |
+|
 ### Framework versions
 - Transformers 4.26.0.dev0