metadata
license: apache-2.0
language:
- 'no'
- en
This is a pruned version of the google/mt5-large
model. Here, the input and output embeddings are pruned to support a greatly reduced vocabulary.
The chosen vocabulary has 30K norwegian, english and special tokens, ~12% of the old size. This reduces the model size by roughly 37%.
The model is still OK on similar languages, like German and Danish, but very different languages like arabic are no longer understood.
This model is intended as starting point for finetuning mt5 for norwegian applications.