SparseLlama-2-7b-evolcodealpaca-pruned_50.2of4

Model Overview

Model Architecture: Llama-2
- Input: Text
- Output: Text
Model Optimizations:
- Pruned: 50% 2:4
Release Date: 7/2/2024
Version: 1.0
Model Developers: Neural Magic

Compressed version of Llama-2-7b specialized for code-generation. This model was obtained by fine-tuning the Sparse Foundational model SparseLlama-2-7b-pruned_50.2of4 on the evol-codealpaca-v1 dataset. SquareHead knowledge distillation was used with Llama-2-7b-evolcodealpaca as teacher. It achieves HumanEval pass@1 of 34.58%, whereas the dense Llama-2-7b-evolcodealpaca model achieves 32.03%.

This model was produced as part if Neural Magic's Sparse Foundational Models initiative, and demostrates the capability of Sparse Foundational Models to transfer to the code-generation domain.

Model Optimizations

This model is derived from the Sparse Foundational model Sparse-Llama-2-7b-pruned_50.2of4, which was obtained by applying the SparseGPT algorithm to prune Llama-2-7b to 50% sparsity with a 2:4 mask. This optimization reduces the number of parameters by 50%, reducing the disk size and FLOPs by the same level.

Evaluation

This model was evaluated in the HumanEval benchmark using the bigcode-evaluation-harness.

Accuracy

Model	HumanEval pass@1	Recovery
Llama-2-7b-evolcodealpaca	32.03%	--
SparseLlama-2-7b-evolcodealpaca-pruned_50.2of4	34.58%	108%

neuralmagic
/

SparseLlama-2-7b-evolcodealpaca-pruned_50.2of4

SparseLlama-2-7b-evolcodealpaca-pruned_50.2of4

Model Overview

Model Optimizations

Evaluation

Accuracy

Dataset used to train neuralmagic/SparseLlama-2-7b-evolcodealpaca-pruned_50.2of4