ivrit-ai/faster-whisper-v2-pd1-e1

Note: If you are looking for our latest dataset and model, please refer to the main README here: https://huggingface.co/ivrit-ai.

Background

This ASR model was trained on a private dataset containing approximately 310 hours of high-quality Hebrew data. Data was transcribed using professional transcription services.

Model name decoding:

This specific model is a faster-whisper variant, large-v2 variant, trained on version 1 of our private dataset (pd1), and saved after one epoch.

Running the model

# Initialize the model
import faster_whisper
model = faster_whisper.WhisperModel('ivrit-ai/faster-whisper-v2-pd1-e1')

# Transcribe a media file
segs, _ = model.transcribe(mp3_file, language='he')
for seg in segs:
    print(seg.text)

The segment object contains more data such as timestamps. Feel free to explore them.