Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Shortened LLaMA Model Card

Shortened LLaMA is a depth-pruned version of LLaMA models & variants for efficient text generation.

Compression Method

After identifying unimportant Transformer blocks, we perform one-shot pruning and light LoRA-based retraining.

Click to see a method figure. method

Model Links

Source
Model
Pruning
Ratio
Pruning
Criterion
HF Models
Link
LLaMA-1-7B 20% PPL nota-ai/st-llama-1-5.5b-ppl
LLaMA-1-7B 20% Taylor+ nota-ai/st-llama-1-5.5b-taylor
Vicuna-v1.3-7B 20% PPL nota-ai/st-vicuna-v1.3-5.5b-ppl
Vicuna-v1.3-7B 20% Taylor+ nota-ai/st-vicuna-v1.3-5.5b-taylor
Vicuna-v1.3-13B 21% PPL nota-ai/st-vicuna-v1.3-10.5b-ppl
Vicuna-v1.3-13B 21% Taylor+ nota-ai/st-vicuna-v1.3-10.5b-taylor

Zero-shot Performance & Efficiency Results

  • EleutherAI/lm-evaluation-harness version 3326c54
results

License

  • All rights related to this repository and the compressed models are reserved by Nota Inc.
  • The intended use is strictly limited to research and non-commercial projects.

Acknowledgments

Citation

@article{kim2024shortened,
  title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
  author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
  journal={arXiv preprint arXiv:2402.02834},      
  year={2024},
  url={https://arxiv.org/abs/2402.02834}
}
@article{kim2024mefomo,
  title={Shortened LLaMA: A Simple Depth Pruning for Large Language Models},
  author={Kim, Bo-Kyeong and Kim, Geonmin and Kim, Tae-Ho and Castells, Thibault and Choi, Shinkook and Shin, Junho and Song, Hyoung-Kyu},
  journal={ICLR Workshop on Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)},
  year={2024},
  url={https://openreview.net/forum?id=18VGxuOdpu}
}
Downloads last month
84
Safetensors
Model size
5.52B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including nota-ai/st-llama-1-5.5b-ppl