You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Cool-Whisper

Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data

Liang-Hsuan Tseng, Zih-Ching Chen, Wei-Shun Chang, Cheng-Kuang Lee, Tsung-Ren Huang, Hung-yi Lee

arXiv Open In Colab

⚠️ Due to privacy and security concerns, this model will be temporarily taken offline. We are sorry for the inconvenience.

⚠️ 因為隱私安全疑慮,本模型將暫時下架。非常抱歉造成大家困擾。

Introduction

  • Cool-whisper is a distilled version of Whisper, mainly focused on Mandarin-English code-switching ASR for people in Taiwan.
  • We use 60,000 hours of unlabeled audio to train the model.
  • Practically, we utilize knowledge not only from the large model (Whisper-large-v2) but also from the small model (Whisper-base).

Basic Usage

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset

device = f"cuda" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "andybi7676/cool-whisper-hf"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, use_safetensors=True
)
processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=256,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

dataset = load_dataset("andybi7676/ntuml2021_long", "default", split="test")
sample = dataset[0]["audio"]
# or your own audio path
# sample = "/your/path/to/audio.wav"

result = pipe(sample)
print("Basic Result: ")
print(result["text"])
# result with timestamps
print("\nResult with timestamps: ")
for chunk in result['chunks']:
  print(chunk)

Faster-Whisper Support

Faster-Whisper is a commonly used tool to accelerate the transcription generation speed based on CTranslate2. We also deploy our model in the form of CTranslate2 to allow using it in faster-whisper. Please visit cool-whisper for more details.

Downloads last month
46
Safetensors
Model size
756M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.