Update README.md

5d8720c verified about 2 months ago

3.67 kB

	---
	license: apache-2.0
	language:
	- ru
	library_name: transformers
	pipeline_tag: automatic-speech-recognition
	base_model: waveletdeboshir/whisper-base-ru-pruned
	tags:
	- asr
	- Pytorch
	- pruned
	- finetune
	- audio
	- automatic-speech-recognition
	model-index:
	- name: Whisper Base Pruned and Finetuned for Russian
	results:
	- task:
	name: Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 15.0 (Russian part, test)
	type: mozilla-foundation/common_voice_15_0
	args: ru
	metrics:
	- name: WER
	type: wer
	value: 26.52
	- task:
	name: Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: Common Voice 15.0 (Russian part, test)
	type: mozilla-foundation/common_voice_15_0
	args: ru
	metrics:
	- name: WER (without punctuation)
	type: wer
	value: 21.35
	datasets:
	- mozilla-foundation/common_voice_15_0
	---

	# Whisper-base-ru-pruned-ft

	## Model info
	This is a finetuned version of pruned whisper-base model ([waveletdeboshir/whisper-base-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-base-ru-pruned)) for Russian language.

	Model was finetuned on russian part of [mozilla-foundation/common_voice_15_0](https://huggingface.co/datasets/mozilla-foundation/common_voice_15_0) with Specaugment, Colored Noise augmentation and Noise from file augmentation.

	## Metrics

	\| metric \| dataset \| waveletdeboshir/whisper-base-ru-pruned \| waveletdeboshir/whisper-base-ru-pruned-ft \|
	\| :------ \| :------ \| :------ \| :------ \|
	\| WER (without punctuation) \| common_voice_15_0_test \| 0.3352 \| 0.2135 \|
	\| WER \| common_voice_15_0_test \| 0.4050 \| 0.2652 \|

	## Limitations
	Because texts in Common Voice don't contain digits and other characters except letters and punctuation signs, model lost an ability to predict numbers and special characters.

	## Size
	Only 10% tokens was left including special whisper tokens (no language tokens except \<\|ru\|\> and \<\|en\|\>, no timestamp tokens), 200 most popular tokens from tokenizer and 4000 most popular Russian tokens computed by tokenization of russian text corpus.

	Model size is 30% less then original whisper-base:
	\| \| openai/whisper-base \| waveletdeboshir/whisper-base-ru-pruned-ft \|
	\| :------ \| :------ \| :------ \|
	\| n of parameters \| 74 M \| 48 M \|
	\| n of parameters (with proj_out layer) \| 99 M \| 50 M \|
	\| model file size \| 290 Mb \| 193 Mb \|
	\| vocab_size \| 51865 \| 4207 \|

	## Usage
	Model can be used as an original whisper:

	```python
	>>> from transformers import WhisperProcessor, WhisperForConditionalGeneration
	>>> import torchaudio

	>>> # load audio
	>>> wav, sr = torchaudio.load("audio.wav")

	>>> # load model and processor
	>>> processor = WhisperProcessor.from_pretrained("waveletdeboshir/whisper-base-ru-pruned-ft")
	>>> model = WhisperForConditionalGeneration.from_pretrained("waveletdeboshir/whisper-base-ru-pruned-ft")

	>>> input_features = processor(wav[0], sampling_rate=sr, return_tensors="pt").input_features

	>>> # generate token ids
	>>> predicted_ids = model.generate(input_features)
	>>> # decode token ids to text
	>>> transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
	['<\|startoftranscript\|><\|ru\|><\|transcribe\|><\|notimestamps\|> Начинаем работу.<\|endoftext\|>']

	```
	The context tokens can be removed from the start of the transcription by setting `skip_special_tokens=True`.

	## Other pruned whisper models
	* [waveletdeboshir/whisper-tiny-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-tiny-ru-pruned)
	* [waveletdeboshir/whisper-small-ru-pruned](https://huggingface.co/waveletdeboshir/whisper-small-ru-pruned)