Whisper Basque (eu) - CTranslate2 Conversion

This is a CTranslate2 conversion of xezpeleta/whisper-medium-eu designed for use with faster-whisper.

Model Details

  • Base Model: OpenAI Whisper Base (original model card: whisper-medium)
  • Finetuned for: Basque (eu) speech recognition
  • Dataset: asierhv/composite_corpus_eu_v2.1 (Mozilla Common Voice 18.0 + Basque Parliament + OpenSLR)
  • Conversion Format: CTranslate2 (optimized for inference)
  • Compatibility: Designed for use with faster-whisper
  • WER: 8.33% on Mozilla Common Voice 17.0

Usage with faster-whisper

First install required packages:

pip install faster-whisper

Then use the following code snippet:

from faster_whisper import WhisperModel

# Load the model (FP16 precision)
model = WhisperModel("xezpeleta/whisper-medium-eu-ct2", device="cuda", compute_type="float16")

# Transcribe audio file
segments, info = model.transcribe("audio.mp3", language="eu")

# Print transcription
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Evaluation

The model achieves 8.33% Word Error Rate (WER) on the Basque test split of Mozilla Common Voice 17.0.

Conversion details

Converted from the original HuggingFace model using:

ct2-transformers-converter --model xezpeleta/whisper-medium-eu \
                           --output_dir whisper-medium-eu-ct2 \
                           --copy_files tokenizer.json preprocessor_config.json \
                           --quantization float16
Downloads last month
23
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for xezpeleta/whisper-medium-eu-ct2

Finetuned
(1)
this model

Dataset used to train xezpeleta/whisper-medium-eu-ct2

Collection including xezpeleta/whisper-medium-eu-ct2

Evaluation results