vocabtrimmer
/

mbart-large-cc25-squad-qa-trimmed-en

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

mbart-large-cc25-squad-qa-trimmed-en / README.md

asahi417's picture

commit files to HF hub

d770b29 over 1 year ago

|

history blame contribute delete

1.81 kB

	# Vocabulary Trimmed [lmqg/mbart-large-cc25-squad-qa](https://huggingface.co/lmqg/mbart-large-cc25-squad-qa): `vocabtrimmer/mbart-large-cc25-squad-qa-trimmed-en`
	This model is a trimmed version of [lmqg/mbart-large-cc25-squad-qa](https://huggingface.co/lmqg/mbart-large-cc25-squad-qa) by [`vocabtrimmer`](https://github.com/asahi417/lm-vocab-trimmer), a tool for trimming vocabulary of language models to compress the model size.
	Following table shows a summary of the trimming process.

	\| \| lmqg/mbart-large-cc25-squad-qa \| vocabtrimmer/mbart-large-cc25-squad-qa-trimmed-en \|
	\|:---------------------------\|:---------------------------------\|:----------------------------------------------------\|
	\| parameter_size_full \| 610,852,864 \| 532,235,264 \|
	\| parameter_size_embedding \| 256,028,672 \| 177,411,072 \|
	\| vocab_size \| 250,028 \| 173,253 \|
	\| compression_rate_full \| 100.0 \| 87.13 \|
	\| compression_rate_embedding \| 100.0 \| 69.29 \|


	Following table shows the parameter used to trim vocabulary.

	\| language \| dataset \| dataset_column \| dataset_name \| dataset_split \| target_vocab_size \| min_frequency \|
	\|:-----------\|:----------------------------\|:-----------------\|:---------------\|:----------------\|:--------------------\|----------------:\|
	\| en \| vocabtrimmer/mc4_validation \| text \| en \| validation \| \| 2 \|