File size: 608 Bytes
8f00ca5
996d9e7
 
8f00ca5
996d9e7
 
8f00ca5
996d9e7
 
 
6b42a54
996d9e7
 
1
2
3
4
5
6
7
8
9
10
11
12
13
---
language:
- en
license: mit
tags:
- english
---
This is a version of the [google/mt5-base](https://huggingface.co/google/mt5-base) model only for English with some embeddings left. 


* Using `sentencepiece` vocabulary was shrinking from 250K to 20K (top 20K English tokens) the number of model parameters reduced to 244M parameters, and model size reduced from 2.2GB to 0.9GB - 39% of the original one.

Approach was taken from article: [How to adapt a multilingual T5 model for a single language](https://cointegrated.medium.com/how-to-adapt-a-multilingual-t5-model-for-a-single-language-b9f94f3d9c90).