--- library_name: transformers language: - en license: apache-2.0 base_model: openai/whisper-small tags: - generated_from_trainer datasets: - mozilla-foundation/common_voice_11_0 model-index: - name: Storymation-whisper Fine-Tuned Model results: [] --- # Storymation-whisper Fine-Tuned Model This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Common Voice 11.0 dataset. ## Model Usage ```python !pip install transformers accelerate gradio from transformers import pipeline import gradio as gr # Load the Whisper model model = "Muneeba23/whisper-small-en" pipe = pipeline(model=model) # Define the transcribe function def transcribe(audio): text = pipe(audio)["text"] return text # Create the Gradio interface iface = gr.Interface( fn=transcribe, inputs=gr.Audio(type="filepath"), outputs="text", title="Whisper Small", description="Real-time Demo. Hurrah!!" ) # Launch the interface iface.launch() ``` ## Intended uses & limitations For a average audio prompt of 5 secs the latency observed was 40 secs. ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 4 - eval_batch_size: 2 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - training_steps: 3 ### Training results - global_step=3, - training_loss=5.196450551350911, - WER = 30% for 8 hours of training ### Framework versions - Transformers 4.45.2 - Pytorch 2.4.1+cu121 - Datasets 3.0.1 - Tokenizers 0.20.0