nota-ai
/

st-llama-1-5.5b-taylor

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

st-llama-1-5.5b-taylor / README.md

bokyeong1015's picture

Update README.md

65a4cd4 verified 7 months ago

|

history blame contribute delete

3.45 kB

	# Shortened LLaMA Model Card

	Shortened LLaMA is a depth-pruned version of LLaMA models & variants for efficient text generation.

	- Developed by: [Nota AI](https://www.nota.ai/)
	- License: Non-commercial license
	- Repository: https://github.com/Nota-NetsPresso/shortened-llm
	- Paper: https://arxiv.org/abs/2402.02834

	## Compression Method
	After identifying unimportant Transformer blocks, we perform one-shot pruning and light LoRA-based retraining.
	<details>
	<summary>
	Click to see a method figure.
	</summary>

	<img alt="method" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/compressed-llm/st-llama_method.png" width="100%">

	</details>

	## Model Links
	\| Source<br>Model \| Pruning<br>Ratio \| Pruning<br>Criterion \| HF Models<br>Link \|
	\|:---:\|:---:\|:---:\|:---:\|
	\| LLaMA-1-7B \| 20% \| PPL \| [nota-ai/st-llama-1-5.5b-ppl](https://huggingface.co/nota-ai/st-llama-1-5.5b-ppl) \|
	\| LLaMA-1-7B \| 20% \| Taylor+ \| [nota-ai/st-llama-1-5.5b-taylor](https://huggingface.co/nota-ai/st-llama-1-5.5b-taylor) \|
	\| Vicuna-v1.3-7B \| 20% \| PPL \| [nota-ai/st-vicuna-v1.3-5.5b-ppl](https://huggingface.co/nota-ai/st-vicuna-v1.3-5.5b-ppl) \|
	\| Vicuna-v1.3-7B \| 20% \| Taylor+ \| [nota-ai/st-vicuna-v1.3-5.5b-taylor](https://huggingface.co/nota-ai/st-vicuna-v1.3-5.5b-taylor) \|
	\| Vicuna-v1.3-13B \| 21% \| PPL \| [nota-ai/st-vicuna-v1.3-10.5b-ppl](https://huggingface.co/nota-ai/st-vicuna-v1.3-10.5b-ppl) \|
	\| Vicuna-v1.3-13B \| 21% \| Taylor+ \| [nota-ai/st-vicuna-v1.3-10.5b-taylor](https://huggingface.co/nota-ai/st-vicuna-v1.3-10.5b-taylor) \|

	## Zero-shot Performance & Efficiency Results
	- EleutherAI/lm-evaluation-harness version [3326c54](https://github.com/EleutherAI/lm-evaluation-harness/tree/3326c547a733d598b4377e54be96e194861b964c)

	<img alt="results" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/compressed-llm/st-llama_zero-shot_scores.png" width="100%">

	## License
	- All rights related to this repository and the compressed models are reserved by Nota Inc.
	- The intended use is strictly limited to research and non-commercial projects.

	## Acknowledgments
	- [LLM-Pruner](https://github.com/horseee/LLM-Pruner), which utilizes [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness), [PEFT](https://github.com/huggingface/peft), and [Alpaca-LoRA](https://github.com/tloen/alpaca-lora). Thanks for the pioneering work on structured pruning of LLMs!
	- Meta AI's [LLaMA](https://github.com/facebookresearch/llama) and LMSYS Org's [Vicuna](https://github.com/lm-sys/FastChat/blob/main/docs/vicuna_weights_version.md). Thanks for the open-source LLMs!

	## Citation
	```bibtex
	@article{kim2024shortened,
	title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
	author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
	journal={arXiv preprint arXiv:2402.02834},
	year={2024},
	url={https://arxiv.org/abs/2402.02834}
	}
	```
	```bibtex
	@article{kim2024mefomo,
	title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
	author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
	journal={ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)},
	year={2024},
	url={https://openreview.net/forum?id=18VGxuOdpu}
	}
	```