File size: 2,649 Bytes
5c1e649 8739442 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 ad62da0 5c1e649 e008d57 5c1e649 ad62da0 5c1e649 e008d57 5c1e649 ad62da0 5c1e649 b10f321 5c1e649 ad62da0 5c1e649 8739442 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
library_name: transformers
tags:
- Structured Pruning
- Phi-2
- Memory-efficient Pruning
license: mit
language:
- en
---
# Model Card for Model ID
We prune the Phi-2 (2.7B) model to 35% sparsty (1.8B) and then finetune on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4).
Our pruning algorithm is described in the paper [Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes](https://arxiv.org/abs/2402.05406).
[Code for pruning algorithm can be found here ](https://github.com/ldery/Bonsai/tree/main).
## Model Details
Model is derived from Pruning the [Phi-2 Model](https://huggingface.co/microsoft/phi-2)
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** Lucio Dery, Steven Kolawole, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar
- **Model type:** Decoder-only
- **Language(s) (NLP):** English
- **License:** MIT
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [https://github.com/ldery/Bonsai/tree/main]
- **Paper [optional]:** [https://arxiv.org/abs/2402.05406]
## Training Details
### Training Data
Finetuned on 100K 2048 length sequences from the C4 dataset (https://huggingface.co/datasets/c4).
### Training Procedure
Full fine-tuning.
#### Training Hyperparameters
Distillation KL-Weight : 0.01
Learning Rate : 1e-4
Batch Size : 128
Optimzer : AdamW
Warmup Steps : 5
### License
The model is licensed under the [MIT license](https://huggingface.co/luciodery/Bonsai-PrunedPhi-1.8B/blob/main/LICENSE).
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** NVIDIA A6000
## Citation
**BibTeX:**
@misc{dery2024everybody,
title={Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes},
author={Lucio Dery and Steven Kolawole and Jean-Francois Kagey and Virginia Smith and Graham Neubig and Ameet Talwalkar},
year={2024},
eprint={2402.05406},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
## Model Card Authors [optional]
Lucio Dery: ldery@andrew.cmu.edu
## Model Card Contact
ldery@andrew.cmu.edu |