Efficient Large Language Model
Collection
Shortened LLMs from Depth Pruning; https://github.com/Nota-NetsPresso/shortened-llm
•
14 items
•
Updated
•
4
Shortened LLM is a depth-pruned version of large language models for efficient text generation.
Source Model |
Pruning Ratio |
Pruning Criterion |
Retraining Method |
HF Models Link |
---|---|---|---|---|
Vicuna-v1.3-7B | 20% | PPL | CPT | nota-ai/cpt_st-vicuna-v1.3-5.5b-ppl |
Vicuna-v1.3-7B | 45% | PPL | CPT | nota-ai/cpt_st-vicuna-v1.3-3.7b-ppl |
Vicuna-v1.3-7B | 60% | PPL | CPT | nota-ai/cpt_st-vicuna-v1.3-2.7b-ppl |
Vicuna-v1.3-7B | 80% | PPL | CPT | nota-ai/cpt_st-vicuna-v1.3-1.5b-ppl |
Vicuna-v1.3-7B | 20% | PPL | CPT⇒LoRA | nota-ai/cpt-lora_st-vicuna-v1.3-5.5b-ppl |
Vicuna-v1.3-7B | 45% | PPL | CPT⇒LoRA | nota-ai/cpt-lora_st-vicuna-v1.3-3.7b-ppl |
Vicuna-v1.3-7B | 60% | PPL | CPT⇒LoRA | nota-ai/cpt-lora_st-vicuna-v1.3-2.7b-ppl |
Vicuna-v1.3-7B | 80% | PPL | CPT⇒LoRA | nota-ai/cpt-lora_st-vicuna-v1.3-1.5b-ppl |
Zero-shot performance over the course of training for models from Vicuna-7B-v1.3 at different pruning ratios. For each model size, the CPT duration was limited to a two-week period, but additional training could further improve the quality.
Source Model |
Pruning Ratio |
Pruning Criterion |
HF Models Link |
---|---|---|---|
LLaMA-1-7B | 20% | PPL | nota-ai/st-llama-1-5.5b-ppl |
LLaMA-1-7B | 20% | Taylor+ | nota-ai/st-llama-1-5.5b-taylor |
Vicuna-v1.3-7B | 20% | PPL | nota-ai/st-vicuna-v1.3-5.5b-ppl |
Vicuna-v1.3-7B | 20% | Taylor+ | nota-ai/st-vicuna-v1.3-5.5b-taylor |
Vicuna-v1.3-13B | 21% | PPL | nota-ai/st-vicuna-v1.3-10.5b-ppl |
Vicuna-v1.3-13B | 21% | Taylor+ | nota-ai/st-vicuna-v1.3-10.5b-taylor |
@article{kim2024shortened,
title={Shortened LLaMA: Depth Pruning for Large Language Models with Comparison of Retraining Methods},
author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
journal={arXiv preprint arXiv:2402.02834},
year={2024},
url={https://arxiv.org/abs/2402.02834}
}
@article{kim2024mefomo,
title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
journal={ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)},
year={2024},
url={https://openreview.net/forum?id=18VGxuOdpu}
}