Ayoub-Laachir
/

MaghrebVoice

Automatic Speech Recognition

Safetensors

whisper

Model card Files Files and versions Community

Ayoub-Laachir commited on Oct 2

Commit

246daeb

•

1 Parent(s): 346f4e6

Update README.md

Browse files

Files changed (1) hide show

README.md +127 -0

README.md CHANGED Viewed

@@ -59,6 +59,133 @@ These metrics demonstrate the model's ability to accurately transcribe Moroccan
 The fine-tuned model shows improved handling of Darija-specific words, sentence structure, and overall accuracy.
 ## Challenges and Future Improvements
 ### Challenges Encountered
 - Diverse spellings of words in Moroccan Darija

 The fine-tuned model shows improved handling of Darija-specific words, sentence structure, and overall accuracy.
+## Audio Transcription Script
+This script demonstrates how to transcribe audio files using the fine-tuned Whisper Large V3 model for Moroccan Darija. It includes steps for installing necessary libraries, loading the model, and processing audio files.
+### Required Libraries
+Before running the script, ensure you have the following libraries installed. You can install them using:
+```bash
+!pip install --upgrade pip
+!pip install --upgrade transformers accelerate librosa soundfile pydub
+```
+```python
+import torch
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
+import librosa
+import soundfile as sf
+from pydub import AudioSegment
+# Set the device to GPU if available, else use CPU
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+# Configuration for the model
+config = {
+    "model_id": "Ayoub-Laachir/MaghrebVoice",  # Model ID from Hugging Face
+    "language": "arabic",                          # Language for transcription
+    "task": "transcribe",                          # Task type
+    "chunk_length_s": 30,                          # Length of each audio chunk in seconds
+    "stride_length_s": 5,                          # Overlap between chunks in seconds
+}
+# Load the model and processor
+def load_model_and_processor():
+    try:
+        model = AutoModelForSpeechSeq2Seq.from_pretrained(
+            config["model_id"],
+            torch_dtype=torch_dtype,               # Use appropriate data type
+            low_cpu_mem_usage=True,                # Use low CPU memory
+            use_safetensors=True,                   # Load model with safetensors
+            attn_implementation="sdpa",            # Specify attention implementation
+        )
+        model.to(device)  # Move model to the specified device
+        processor = AutoProcessor.from_pretrained(config["model_id"])
+        print("Model and processor loaded successfully.")
+        return model, processor
+    except Exception as e:
+        print(f"Error loading model and processor: {e}")
+        return None, None
+# Load the model and processor
+model, processor = load_model_and_processor()
+if model is None or processor is None:
+    print("Failed to load model or processor")
+    exit(1)
+# Configure the generation parameters for the pipeline
+generate_kwargs = {
+    "language": config["language"],  # Language for the pipeline
+    "task": config["task"],          # Task for the pipeline
+}
+# Initialize the automatic speech recognition pipeline
+pipe = pipeline(
+    "automatic-speech-recognition",
+    model=model,
+    tokenizer=processor.tokenizer,
+    feature_extractor=processor.feature_extractor,
+    torch_dtype=torch_dtype,
+    device=device,
+    generate_kwargs=generate_kwargs,
+    chunk_length_s=config["chunk_length_s"],  # Length of each audio chunk
+    stride_length_s=config["stride_length_s"],  # Overlap between chunks
+)
+# Convert audio to 16kHz sampling rate
+def convert_audio_to_16khz(input_path, output_path):
+    audio, sr = librosa.load(input_path, sr=None)  # Load the audio file
+    audio_16k = librosa.resample(audio, orig_sr=sr, target_sr=16000)  # Resample to 16kHz
+    sf.write(output_path, audio_16k, 16000)  # Save the converted audio
+# Format time in HH:MM:SS.milliseconds
+def format_time(seconds):
+    hours = int(seconds // 3600)
+    minutes = int((seconds % 3600) // 60)
+    seconds = seconds % 60
+    return f"{hours:02d}:{minutes:02d}:{seconds:06.3f}"
+# Transcribe audio file
+def transcribe_audio(audio_path):
+    try:
+        result = pipe(audio_path, return_timestamps=True)  # Transcribe audio and get timestamps
+        return result["chunks"]  # Return transcription chunks
+    except Exception as e:
+        print(f"Error transcribing audio: {e}")
+        return None
+# Main function to execute the transcription process
+def main():
+    # Specify input and output audio paths (update paths as needed)
+    input_audio_path = "/path/to/your/input/audio.mp3"  # Replace with your input audio path
+    output_audio_path = "/path/to/your/output/audio_16khz.wav"  # Replace with your output audio path
+    # Convert audio to 16kHz
+    convert_audio_to_16khz(input_audio_path, output_audio_path)
+    # Transcribe the converted audio
+    transcription_chunks = transcribe_audio(output_audio_path)
+    if transcription_chunks:
+        print("WEBVTT\n")  # Print header for WEBVTT format
+        for chunk in transcription_chunks:
+            start_time = format_time(chunk["timestamp"][0])  # Format start time
+            end_time = format_time(chunk["timestamp"][1])    # Format end time
+            text = chunk["text"]                              # Get the transcribed text
+            print(f"{start_time} --> {end_time}")           # Print time range
+            print(f"{text}\n")                               # Print transcribed text
+    else:
+        print("Transcription failed.")
+if __name__ == "__main__":
+    main()
+```
 ## Challenges and Future Improvements
 ### Challenges Encountered
 - Diverse spellings of words in Moroccan Darija