File size: 2,237 Bytes
8baba50 231d20a ba0c7b2 231d20a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
license: other
---
# Model Card for llama-7b-hf-28q_4bit-128g_WVU
## Model Description
`llama-7b-hf-28q_4bit-128g_WVU` is a model based on the
Llama architecture with 7 billion parameters.
This model adopts a quantization in which the first 28 layers
of the decoder have been quantized with the [`gptq`](https://github.com/qwopqwop200/GPTQ-for-LLaMa) method,
which uses 4-bit precision and 128 groups.
Then, the last 4 decoder layers (1/8 of decoding layers), and lm_head have been fine-tuned using the [wizard_vicuna_70k_unfiltered dataset](https://huggingface.co/datasets/ehartford/wizard_vicuna_70k_unfiltered), 1 epoch.
## Note
Quantization effectively reduces memory usage, however, it may result in differences in the parameters.
Additionally, fine-tuning only the last few layers lowers memory requirements for training but could lead to minor performance degradation.
Several alternatives exist for fine-tuning and quantizing the Llama models. The specific method utilized here—quantizing several layers,
followed by fine-tuning the last few layers—is designed to account for errors introduced during quantization (which sometimes can result in unexpected answers),
and enables the last few layers to be fine-tuned considering both the quantization error and the dataset.
It is worth mentioning that other methods may yield superior performance. For instance:
1. Fine-tuning the entire model for `X` epochs
2. Quantizing the first `K` layers
3. Fine-tuning the remaining layers for `Y` epochs
Nonetheless, as fine-tuning the entire model requires considerable resources (for example, 4 GPUs with 80GB VRAM is required for 7B LLaMa),
this model omit the first step from the method described above, and it works.
## Using the Model
To load the model, a custom `LlamaForCausalLM` is required.
You can find quantized llama [here](https://github.com/LearnItAnyway/quantized_llama).
## References
1. Meta - LLaMA
2. [WizardLM](https://github.com/nlpxucan/WizardLM)
3. [GPTQ for LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)
4. [Wizard Vicuna Unfiltered Dataset](https://huggingface.co/datasets/ehartford/wizard_vicuna_70k_unfiltered)
5. Various unlisted but great works, researches, and projects.
|