vocabtrimmer
/

xlm-roberta-base-trimmed-it

Inference Endpoints

Model card Files Files and versions Community

xlm-roberta-base-trimmed-it / README.md

asahi417's picture

commit files to HF hub

71af52d over 1 year ago

|

history blame contribute delete

1.58 kB

	# Vocabulary Trimmed [xlm-roberta-base](https://huggingface.co/xlm-roberta-base): `vocabtrimmer/xlm-roberta-base-trimmed-it`
	This model is a trimmed version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) by [`vocabtrimmer`](https://github.com/asahi417/lm-vocab-trimmer), a tool for trimming vocabulary of language models to compress the model size.
	Following table shows a summary of the trimming process.

	\| \| xlm-roberta-base \| vocabtrimmer/xlm-roberta-base-trimmed-it \|
	\|:---------------------------\|:-------------------\|:-------------------------------------------\|
	\| parameter_size_full \| 278,295,186 \| 138,183,386 \|
	\| parameter_size_embedding \| 192,001,536 \| 52,071,936 \|
	\| vocab_size \| 250,002 \| 67,802 \|
	\| compression_rate_full \| 100.0 \| 49.65 \|
	\| compression_rate_embedding \| 100.0 \| 27.12 \|


	Following table shows the parameter used to trim vocabulary.

	\| language \| dataset \| dataset_column \| dataset_name \| dataset_split \| target_vocab_size \| min_frequency \|
	\|:-----------\|:----------------------------\|:-----------------\|:---------------\|:----------------\|:--------------------\|----------------:\|
	\| it \| vocabtrimmer/mc4_validation \| text \| it \| validation \| \| 2 \|