File size: 7,274 Bytes
38dcd4c
04c9c5b
38dcd4c
df26af2
4b37c7f
202436c
 
 
696204f
 
cb71146
 
 
 
 
 
 
203be2b
b3f3600
3880981
 
 
 
 
 
 
b3f3600
 
38dcd4c
 
041b028
38dcd4c
041b028
38dcd4c
 
041b028
38dcd4c
041b028
38dcd4c
041b028
38dcd4c
4d1fbd6
 
2923e41
 
 
 
38dcd4c
4d1fbd6
38dcd4c
2923e41
38dcd4c
4d1fbd6
38dcd4c
11d61c0
 
0122800
 
 
11d61c0
38dcd4c
4d1fbd6
cb71146
4d1fbd6
 
2923e41
f688400
38dcd4c
f688400
 
 
2923e41
38dcd4c
2923e41
 
 
38dcd4c
2923e41
4d1fbd6
38dcd4c
ea312db
731b15d
 
 
 
 
 
 
ea312db
731b15d
 
 
 
bbcfbc8
ea312db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5501946
 
 
 
 
7ef16c1
5501946
 
 
 
 
 
 
 
38dcd4c
 
 
 
 
0f04e2a
38dcd4c
0f04e2a
26e55fb
38dcd4c
 
4d1fbd6
38dcd4c
 
 
 
4d1fbd6
 
 
11d61c0
 
 
38dcd4c
26e55fb
 
 
 
 
 
38dcd4c
 
 
4d1fbd6
38dcd4c
 
4d1fbd6
38dcd4c
f02a947
11d61c0
 
 
 
38dcd4c
4d1fbd6
38dcd4c
11d61c0
7e39a07
38dcd4c
 
4d1fbd6
38dcd4c
4d1fbd6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
---
library_name: peft
base_model: TheBloke/Llama-2-7b-Chat-GPTQ
pipeline_tag: text-generation
inference: false
license: openrail
language:
- en
datasets:
- flytech/python-codes-25k
co2_eq_emissions:
  emissions: 1190
  source: >-
    Quantifying the Carbon Emissions of Machine Learning
    https://mlco2.github.io/impact#compute
  training_type: finetuning
  hardware_used: 1 P100 16GB GPU
widget:
- text: hello this is an example
tags:
- text2code
- LoRA
- GPTQ
- Llama-2-7B-Chat
- text2python
- instruction2code
- nl2code
- python
---

# Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K

Generate Python code that accomplishes the task instructed.


## LoRA Adpater Head

### Description

Parameter Efficient Finetuning(PEFT) a 4bit quantized Llama-2-7b-Chat from TheBloke/Llama-2-7b-Chat-GPTQ on flytech/python-codes-25k dataset.

- **Language(s) (NLP):** English
- **License:** openrail
- **Qunatization:** GPTQ 4bit
- **PEFT:** LoRA
- **Finetuned from model [TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)**
- **Dataset:** [flytech/python-codes-25k](https://huggingface.co/datasets/flytech/python-codes-25k)

## Intended uses & limitations

Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

### How to use

```
The quantized model is finetuned as PEFT. We have the trained Adapter.
Merging LoRA adapater with GPTQ quantized model is not yet supported.
So instead of loading a single finetuned model, we need to load the base
model and merge the finetuned adapter on top.
```

```python
instruction = """"Help me set up my daily to-do list!""""
```
```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM,AutoTokenizer

config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")                        #PEFT Config
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ",device_map='auto')  #Loading the Base Model
model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code")                   #Combining Trained Adapter with Base Model
tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")

inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(code)
```

### A Test Example
```python
User_Prompt = """Write a Python program to implement K-Means clustering. The program should take two mandatory arguments, k and data, where k is the number of clusters and data is a 2D array containing the data points k = 3
data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]"""

inputs = tokenizer(User_Prompt, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
python_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print("Generated Output:",python_code)
>>>
```
``````
Generated Output:Write a Python program to implement K-Means clustering. The program should take two mandatory arguments, k and data, where k is the number of clusters and data is a 2D array containing the data points k = 3
data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]] Ready for action! Let's do this! ```python
import numpy as np

def kmeans(data, k):
    # Initialize the centroids
    centroids = np.random.rand(k, 2)
    
    # Initialize the cluster assignments
    cluster_assignments = np.zeros(data.shape[0], dtype=int)
    
    # Iterate through the data points
    for i in range(data.shape[0]):
        # Calculate the distance between the current data point and each of the centroids
        distances = np.linalg.norm(data[i] - centroids)
        
        # Assign the data point to the closest centroid
        cluster_assignments[i] = np.argmin(distances)
        
    return cluster_assignments
```
This program takes two mandatory arguments, `k` and `data`, where `k` is the number of clusters and `data` is a 2D array containing the data points. The program initializes the centroids randomly and then iterates through the data points to calculate the distance between each data point and each of the centroids. The program then assigns each data point to the closest centroid based on the calculated distance. Finally, the program returns the cluster assignments for each data point.

Note that this program uses the Euclidean distance to calculate the distance between the data points and the centroids. You can change the distance metric if needed.

Also, this program assumes that the data points are 2D. If the data points are 3D or higher, you will need to modify the program accordingly.

I hope this helps! Let me know if you have any questions.

```python
# Example usage
data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
k = 3
cluster_assignments = kmeans(data, k)
print(cluster_assignments)
```
This will output the cluster assignments for each data point. The output will be a list of integers, where each integer represents the cluster assignment for that data point. For example, if the data points are
---------------------------------------------------------------------

``````

## Size Comparison

The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace




| Model                   | Total Size  | Training Using Adam |
| ------------------------|-------------| --------------------| 
| Base Model              | 12.37 GB    | 49.48 GP            |
| 4bitQuantized+PEFT      | 3.90 GB     | 11 GB               |


## Training Details

### Training Data

****Dataset:****[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)

Trained on `instruction` column of 20,000 randomly shuffled data.

### Training Procedure

HuggingFace Accelerate with Training Loop.


#### Training Hyperparameters

- **Optimizer:** AdamW
- **lr:** 2e-5
- **decay:** linear
- **batch_size:** 4
- **gradient_accumulation_steps:** 8
- **global_step:** 625

 LoraConfig
 - ***r:*** 8
 - ***lora_alpha:*** 32
 - ***target_modules:***  ["k_proj","o_proj","q_proj","v_proj"]
 - ***lora_dropout:*** 0.05


#### Hardware

- **GPU:** P100


## Additional Information

- ***Github:*** [Repository](https://github.com/swastikmaiti/Llama-2-7B-Chat-PEFT.git)
- ***Intro to quantization:*** [Blog](https://huggingface.co/blog/merve/quantization)
- ***Emergent Feature:*** [Academic](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features)
- ***GPTQ Paper:*** [GPTQ](https://arxiv.org/pdf/2210.17323)
- ***BITSANDBYTES and further*** [LLM.int8()](https://arxiv.org/pdf/2208.07339)

## Acknowledgment

Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
Thanks to [@HuggungFace Team](https://huggingface.co/blog/gptq-integration#fine-tune-quantized-models-with-peft) for the [notebook](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing) on GPTQ.


## Model Card Authors

Swastik Maiti