metadata

license: apache-2.0
language:
  - ru
library_name: transformers
pipeline_tag: automatic-speech-recognition
tags:
  - asr
  - Pytorch
  - pruned
  - audio
  - automatic-speech-recognition

Whisper-base-ru-pruned

Model info

This is a pruned version of openai/whisper-base model with only russian tokens left. Pruning was made without any fine-tuning. Method from this post was used.

Size

Only 10% tokens was left including special whisper tokens, added whisper tokens, 100 most popular tokens from tokenizer and 3000 most popular Russian tokens computed by tokenization of russian text corpus.

Model size is 30% less then original whisper-base:

	openai/whisper-base	waveletdeboshir/whisper-base-ru-pruned
n of parameters	74 M	48.5 M
n of parameters (with proj_out layer)	99 M	51 M
model file size	290 Mb	203 Mb
vocab_size	51865	4705

Usage

Model can be used as an original whisper:

>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
>>> import torchaudio

>>> # load audio
>>> wav, sr = torchaudio.load("audio.wav")

>>> # load model and processor
>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-base-ru-pruned")
>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-base-ru-pruned")

>>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features 

>>> # generate token ids
>>> predicted_ids = model.generate(input_features)
>>> # decode token ids to text
>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
['<|startoftranscript|><|ru|><|transcribe|><|notimestamps|> Начинаем работу.<|endoftext|>']

The context tokens can be removed from the start of the transcription by setting skip_special_tokens=True.

Other pruned whisper models

Metrics

Metrics for this model are on the same level as for openai/whisper-base.

You can fine-tune this model on your data to achive better performance.

Colab for pruning

TODO