metadata

license: mit
datasets:
  - jacktol/atc-dataset
language:
  - en
metrics:
  - wer
base_model:
  - openai/whisper-medium.en
pipeline_tag: automatic-speech-recognition
tags:
  - aviation
  - atc
  - aircraft
  - communication
model-index:
  - name: Whisper Medium EN Fine-Tuned for ATC
    results:
      - task:
          type: automatic-speech-recognition
        dataset:
          name: ATC Dataset
          type: jacktol/atc-dataset
        metrics:
          - name: Word Error Rate (WER)
            type: wer
            value: 15.08
        source:
          name: ATC Transcription Evaluation
          url: https://jacktol.net/posts/fine-tuning_whisper_for_atc/

Whisper Medium EN Fine-Tuned for Air Traffic Control (ATC)

Model Overview

This model is a fine-tuned version of OpenAI's Whisper Medium EN model, specifically trained on Air Traffic Control (ATC) communication datasets. The fine-tuning process significantly improves transcription accuracy on domain-specific aviation communications, reducing the Word Error Rate (WER) by 84%, compared to the original pretrained model. The model is particularly effective at handling accent variations and ambiguous phrasing often encountered in ATC communications.

Base Model: OpenAI Whisper Medium EN
Fine-tuned Model WER: 15.08%
Pretrained Model WER: 94.59%
Relative Improvement: 84.06%

You can access the fine-tuned model on Hugging Face:

Model Description

Whisper Medium EN fine-tuned for ATC is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from:

ATCO2 corpus (1-hour test subset)
UWB-ATCC corpus

The fine-tuned model demonstrates enhanced performance in interpreting various accents, recognizing non-standard phraseology, and processing noisy or distorted communications. It is highly suitable for aviation-related transcription tasks.

Intended Use

The fine-tuned Whisper model is designed for:

Transcribing aviation communication: Providing accurate transcriptions for ATC communications, including accents and variations in English phrasing.
Air Traffic Control Systems: Assisting in real-time transcription of pilot-ATC conversations, helping improve situational awareness.
Research and training: Useful for researchers, developers, or aviation professionals studying ATC communication or developing new tools for aviation safety.

You can test the model online using the ATC Transcription Assistant, which lets you upload audio files and generate transcriptions.

Model Description

Whisper Medium EN fine-tuned for ATC is optimized to handle short, distinct transmissions between pilots and air traffic controllers. It is fine-tuned using data from the ATC Dataset, a combined and cleaned dataset sourced from the following:

ATCO2 corpus (1-hour test subset)
UWB-ATCC corpus

The ATC Dataset merges these two original sources, filtering and refining the data to enhance transcription accuracy for domain-specific ATC communications.

Training Procedure

Hardware: Fine-tuning was conducted on two A100 GPUs with 80GB memory.
Epochs: 10
Learning Rate: 1e-5
Batch Size: 32 (effective batch size with gradient accumulation)
Augmentation: Dynamic data augmentation techniques (Gaussian noise, pitch shifting, etc.) were applied during training.
Evaluation Metric: Word Error Rate (WER)

Limitations

While the fine-tuned model performs well in ATC-specific communications, it may not generalize as effectively to other domains of speech. Additionally, like most speech-to-text models, transcription accuracy can be affected by extremely poor-quality audio or heavily accented speech not encountered during training.

References

Blog Post: Fine-Tuning Whisper for ATC: 84% Improvement in Transcription Accuracy
GitHub Repository: Fine-Tuning Whisper on ATC Data