waveletdeboshir
/

whisper-base-ru-pruned

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

waveletdeboshir commited on Aug 13, 2024

Commit

b3bdc93

·

verified ·

1 Parent(s): b329385

Update README.md

Files changed (1) hide show

README.md +11 -2

README.md CHANGED Viewed

@@ -19,11 +19,20 @@ This is a pruned version of [openai/whisper-base](https://huggingface.co/openai/
 Pruning was made without any fine-tuning. Method from [this post](https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-adapt-a-multilingual-t5-model-for-a-single-language-b9f94f3d9c90) was used.
 ## Size
-Only 10% tokens was left: special whisper tokens, added whisper tokens, 100 most popular tokens from tokenizer and 3000 most popular Russian tokens computed by tokenization of russian text corpus.
 Model size is 30%  less then original whisper-base:
 |  | openai/whisper-base | waveletdeboshir/whisper-base-ru-pruned |
 | :------ | :------ | :------ |
 | n of parameters | 74 M | 48.5 M |
 | n of parameters (with proj_out layer) | 99 M | 51 M |
 | model file size | 290 Mb | 203 Mb |
-| vocab_size | 51865 | 4705 |

 Pruning was made without any fine-tuning. Method from [this post](https://medium.com/m/global-identity-2?redirectUrl=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-adapt-a-multilingual-t5-model-for-a-single-language-b9f94f3d9c90) was used.
 ## Size
+Only 10% tokens was left including special whisper tokens, added whisper tokens, 100 most popular tokens from tokenizer and 3000 most popular Russian tokens computed by tokenization of russian text corpus.
 Model size is 30%  less then original whisper-base:
 |  | openai/whisper-base | waveletdeboshir/whisper-base-ru-pruned |
 | :------ | :------ | :------ |
 | n of parameters | 74 M | 48.5 M |
 | n of parameters (with proj_out layer) | 99 M | 51 M |
 | model file size | 290 Mb | 203 Mb |
+| vocab_size | 51865 | 4705 |
+## Metrics
+Metrics for this model are on the same level as for openai/whisper-base.
+You can fine-tune this model on your data to achive better performance.
+## Colab for pruning
+TODO