File size: 4,224 Bytes
38dcd4c 04c9c5b 38dcd4c df26af2 4b37c7f 202436c 696204f cb71146 3880981 38dcd4c 041b028 38dcd4c 041b028 38dcd4c 041b028 38dcd4c 041b028 38dcd4c 041b028 38dcd4c 4d1fbd6 2923e41 38dcd4c 4d1fbd6 38dcd4c 2923e41 38dcd4c 4d1fbd6 38dcd4c 11d61c0 0122800 11d61c0 38dcd4c 4d1fbd6 cb71146 4d1fbd6 2923e41 38dcd4c edb13d1 2923e41 38dcd4c 2923e41 38dcd4c 2923e41 4d1fbd6 38dcd4c 5501946 7ef16c1 5501946 38dcd4c 0f04e2a 38dcd4c 0f04e2a 26e55fb 38dcd4c 4d1fbd6 38dcd4c 4d1fbd6 11d61c0 38dcd4c 26e55fb 38dcd4c 4d1fbd6 38dcd4c 4d1fbd6 38dcd4c 11d61c0 38dcd4c 4d1fbd6 38dcd4c 11d61c0 4599687 38dcd4c 4d1fbd6 38dcd4c 4d1fbd6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
library_name: peft
base_model: TheBloke/Llama-2-7b-Chat-GPTQ
pipeline_tag: text-generation
inference: false
license: openrail
language:
- en
datasets:
- flytech/python-codes-25k
co2_eq_emissions:
emissions: 1190
source: >-
Quantifying the Carbon Emissions of Machine Learning
https://mlco2.github.io/impact#compute
training_type: finetuning
hardware_used: 1 P100 16GB GPU
tags:
- text2code
- LoRA
- GPTQ
- Llama-2-7B-Chat
- text2python
- instruction2code
---
# Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K
Generate Python code that accomplishes the task instructed.
## LoRA Adpater Head
### Description
Parameter Efficient Finetuning(PEFT) a 4bit quantized Llama-2-7b-Chat from TheBloke/Llama-2-7b-Chat-GPTQ on flytech/python-codes-25k dataset.
- **Language(s) (NLP):** English
- **License:** openrail
- **Qunatization:** GPTQ 4bit
- **PEFT:** LoRA
- **Finetuned from model [TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)**
- **Dataset:** [flytech/python-codes-25k](https://huggingface.co/datasets/flytech/python-codes-25k)
## Intended uses & limitations
Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.
### How to use
```
The quantized model is finetuned as PEFT. We have the trained Adapter.
Merging LoRA adapater with GPTQ quantized model is not yet supported.
So instead of loading a single finetuned model, we need to load the base
model and merge the finetuned adapter on top.
```
```python
instruction = """"Help me set up my daily to-do list!""""
```
```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code") #PEFT Config
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ") #Loading the Base Model
model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code") #Combining Trained Adapter with Base Model
tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")
inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(code)
```
### Size Comparison
The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace
| Model | Total Size | Training Using Adam |
| ------------------------|-------------| --------------------|
| Base Model | 12.37 GB | 49.48 GP |
| 4bitQuantized+PEFT | 3.90 GB | 11 GB |
## Training Details
### Training Data
****Dataset:****[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)
Trained on `instruction` column of 20,000 randomly shuffled data.
### Training Procedure
HuggingFace Accelerate with Training Loop.
#### Training Hyperparameters
- **Optimizer:** AdamW
- **lr:** 2e-5
- **decay:** linear
- **batch_size:** 4
- **gradient_accumulation_steps:** 8
- **global_step:** 625
LoraConfig
- ***r:*** 8
- ***lora_alpha:*** 32
- ***target_modules:*** ["k_proj","o_proj","q_proj","v_proj"]
- ***lora_dropout:*** 0.05
#### Hardware
- **GPU:** P100
## Additional Information
- ***Github:*** [Repository]()
- ***Intro to quantization:*** [Blog](https://huggingface.co/blog/merve/quantization)
- ***Emergent Feature:*** [Academic](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features)
- ***GPTQ Paper:*** [GPTQ](https://arxiv.org/pdf/2210.17323)
- ***BITSANDBYTES and further*** [LLM.int8()](https://arxiv.org/pdf/2208.07339)
## Acknowledgment
Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
Thanks to [@HuggungFace Team](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing#scrollTo=vT0XjNc2jYKy) for the notebook on gptq.
## Model Card Authors
Swastik Maiti |