vocabtrimmer
/

xlm-v-base-trimmed-en

Inference Endpoints

Model card Files Files and versions Community

asahi417 commited on Mar 30, 2023

Commit

26d05e6

•

1 Parent(s): a4e0768

commit files to HF hub

Files changed (1) hide show

README.md +18 -0

README.md ADDED Viewed

	@@ -0,0 +1,18 @@

+# Vocabulary Trimmed [facebook/xlm-v-base](https://huggingface.co/facebook/xlm-v-base): `vocabtrimmer/xlm-v-base-trimmed-en`
+This model is a trimmed version of [facebook/xlm-v-base](https://huggingface.co/facebook/xlm-v-base) by [`vocabtrimmer`](https://github.com/asahi417/lm-vocab-trimmer), a tool for trimming vocabulary of language models to compress the model size.
+Following table shows a summary of the trimming process.
+|                            | facebook/xlm-v-base   | vocabtrimmer/xlm-v-base-trimmed-en   |
+|:---------------------------|:----------------------|:-------------------------------------|
+| parameter_size_full        | 779,396,349           | 458,814,091                          |
+| parameter_size_embedding   | 692,451,072           | 372,285,696                          |
+| vocab_size                 | 901,629               | 484,747                              |
+| compression_rate_full      | 100.0                 | 58.87                                |
+| compression_rate_embedding | 100.0                 | 53.76                                |
+Following table shows the parameter used to trim vocabulary.
+ | language   | dataset                     | dataset_column   | dataset_name   | dataset_split   | target_vocab_size   |   min_frequency |
+|:-----------|:----------------------------|:-----------------|:---------------|:----------------|:--------------------|----------------:|
+| en         | vocabtrimmer/mc4_validation | text             | en             | validation      |                     |               2 |