Whosper-large-v2

Model Overview

Whosper-large-v2 is a cutting-edge speech recognition model tailored for Wolof, Senegal's primary language. Built on OpenAI's Whisper-large-v2, it advances African language processing with notable improvements in Word Error Rate (WER) and Character Error Rate (CER). Whether you're transcribing conversations, building language learning tools, or conducting research, this model is designed for researchers, developers, and students working with Wolof speech data.

Key Strengths

  • Superior Code-Switching: Handles natural Wolof-French/English mixing, mirroring real-world speech patterns
  • Multilingual: Performs well in French and English in addition to Wolof
  • Production-Ready: Thoroughly tested and optimized for deployment
  • Open Source: Released under the apache-2.0 license, perfect for research and development
  • African NLP Focus: Contributing to the broader goal of comprehensive African language support

Performance Metrics

  • WER: 0.2345
  • CER: 0.1101

Lower values mean better accuracy—ideal for practical applications!

Performance Comparison

Metric Whosper-large-v2 Whosper-large Improvement
WER 0.2345 0.2423 3.2% better
CER 0.1101 0.1135 3.0% better

Key Features

  • Improved WER and CER compared to whosper-large
  • Optimized for Wolof and French recognition
  • Enhanced performance on bilingual content

Limitations

  • Reduced performance on English compared to whosper-large
  • Less effective for general multilingual content compared to whosper-large
  • Low performances on very bad audios quality

Training Data

Trained on diverse Wolof speech data:

  • ALFFA Public Dataset
  • FLEURS Dataset
  • Bus Urbain Dataset
  • Anta Women TTS Dataset
  • Kallama Dataset

This diversity ensures the model excels across:

  • Speaking styles and dialects
  • Code-switching patterns
  • Gender and age groups
  • Recording conditions

Quick Start Guide

Installation

pip install git+https://github.com/sudoping01/whosper.git@v1.0.0

Basic Usage

from whosper import WhosperTranscriber

# Initialize the transcriber
transcriber = WhosperTranscriber(model_id="CAYTU/whosper-large-v2") 

# Transcribe an audio file
result = transcriber.transcribe_audio("path/to/your/audio.wav")
print(result)

Training Results

Training Loss Epoch Step Validation Loss
0.7575 0.9998 2354 0.7068
0.6429 1.9998 4708 0.6073
0.5468 2.9998 7062 0.5428
0.4439 3.9998 9416 0.4935
0.3208 4.9998 11770 0.4600
0.2394 5.9998 14124 0.4490

Framework Versions

  • PEFT: 0.14.1.dev0
  • Transformers: 4.49.0.dev0
  • PyTorch: 2.5.1+cu124
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Contributing to African NLP

Whosper-large-v2 embodies our commitment to open science and the advancement of African language technologies. We believe that by making cutting-edge speech recognition models freely available, we can accelerate NLP development across Africa.

Join our mission to democratize AI technology:

  • Open Science: Use and build upon our research - all code, models, and documentation are open source
  • Data Contribution: Share your Wolof speech datasets to help improve model performance
  • Research Collaboration: Integrate Whosper into your research projects and share your findings
  • Community Building: Help us create resources for African language processing
  • Educational Impact: Use Whosper in educational settings to train the next generation of African AI researchers

Together, we can ensure African languages are well-represented in the future of AI technology. Whether you're a researcher, developer, educator, or language enthusiast, your contributions can help bridge the technological divide.

License

Apache License 2.0

This model is released under the Apache 2.0 license to encourage research, commercial use, and innovation in African language technologies while ensuring proper attribution and patent protection. You are free to:

  • Use the model commercially
  • Modify and distribute the model
  • Create derivative works
  • Use the model for patent purposes

Choosing Apache 2.0 aligns with our goals of open science and advancing African NLP while providing necessary protections for the community.

Citation

@misc{whosper2025,
  title={Whosper-large: A Multilingual ASR Model for Wolof with Enhanced Code-Switching Capabilities},
  author={Seydou DIALLO},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/CAYTU/whosper-large},
  version={1.0}
}

Acknowledgments

Developed by Seydou DIALLO at Caytu Robotics's AI Department, building on OpenAI's Whisper-large-v2. Special thanks to the Wolof-speaking community and contributors advancing African language technology.

Contact US

For any question or support contact us

Email : sdiallo@caytu.com

Downloads last month
527
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support automatic-speech-recognition models for peft library.

Model tree for CAYTU/whosper-large-v2

Adapter
(228)
this model

Evaluation results