vocabtrimmer
/

xlm-roberta-base-tweet-sentiment-en-trimmed-en-5000

Text Classification

Inference Endpoints

Model card Files Files and versions Community

xlm-roberta-base-tweet-sentiment-en-trimmed-en-5000 / README.md

asahi417's picture

commit files to HF hub

22f69e7 over 1 year ago

|

history blame contribute delete

2.11 kB

	# Vocabulary Trimmed [cardiffnlp/xlm-roberta-base-tweet-sentiment-en](https://huggingface.co/cardiffnlp/xlm-roberta-base-tweet-sentiment-en): `vocabtrimmer/xlm-roberta-base-tweet-sentiment-en-trimmed-en-5000`
	This model is a trimmed version of [cardiffnlp/xlm-roberta-base-tweet-sentiment-en](https://huggingface.co/cardiffnlp/xlm-roberta-base-tweet-sentiment-en) by [`vocabtrimmer`](https://github.com/asahi417/lm-vocab-trimmer), a tool for trimming vocabulary of language models to compress the model size.
	Following table shows a summary of the trimming process.

	\| \| cardiffnlp/xlm-roberta-base-tweet-sentiment-en \| vocabtrimmer/xlm-roberta-base-tweet-sentiment-en-trimmed-en-5000 \|
	\|:---------------------------\|:-------------------------------------------------\|:-------------------------------------------------------------------\|
	\| parameter_size_full \| 278,045,955 \| 89,885,955 \|
	\| parameter_size_embedding \| 192,001,536 \| 3,841,536 \|
	\| vocab_size \| 250,002 \| 5,002 \|
	\| compression_rate_full \| 100.0 \| 32.33 \|
	\| compression_rate_embedding \| 100.0 \| 2.0 \|


	Following table shows the parameter used to trim vocabulary.

	\| language \| dataset \| dataset_column \| dataset_name \| dataset_split \| target_vocab_size \| min_frequency \|
	\|:-----------\|:----------------------------\|:-----------------\|:---------------\|:----------------\|--------------------:\|----------------:\|
	\| en \| vocabtrimmer/mc4_validation \| text \| en \| validation \| 5000 \| 2 \|