File size: 4,138 Bytes

38dcd4c
04c9c5b
38dcd4c
df26af2
4b37c7f
202436c
 
 
696204f
 
cb71146
 
 
 
 
 
 
3880981
 
 
 
 
 
 
38dcd4c
 
041b028
38dcd4c
041b028
38dcd4c
 
041b028
38dcd4c
041b028
38dcd4c
041b028
38dcd4c
4d1fbd6
 
2923e41
 
 
 
38dcd4c
4d1fbd6
38dcd4c
2923e41
38dcd4c
4d1fbd6
38dcd4c
11d61c0
 
0122800
 
 
11d61c0
38dcd4c
4d1fbd6
cb71146
4d1fbd6
 
2923e41
 
38dcd4c
2923e41
 
 
 
38dcd4c
2923e41
 
 
38dcd4c
2923e41
4d1fbd6
38dcd4c
5501946
 
 
 
 
 
7ef16c1
5501946
 
 
 
 
 
 
 
38dcd4c
 
 
 
 
0f04e2a
38dcd4c
0f04e2a
26e55fb
38dcd4c
 
4d1fbd6
38dcd4c
 
 
 
4d1fbd6
 
 
11d61c0
 
 
38dcd4c
26e55fb
 
 
 
 
 
38dcd4c
 
 
4d1fbd6
38dcd4c
 
4d1fbd6
38dcd4c
11d61c0
 
 
 
 
38dcd4c
4d1fbd6
38dcd4c
11d61c0
4599687
38dcd4c
 
4d1fbd6
38dcd4c
4d1fbd6

---
library_name: peft
base_model: TheBloke/Llama-2-7b-Chat-GPTQ
pipeline_tag: text-generation
inference: false
license: openrail
language:
- en
datasets:
- flytech/python-codes-25k
co2_eq_emissions:
  emissions: 1190
  source: >-
    Quantifying the Carbon Emissions of Machine Learning
    https://mlco2.github.io/impact#compute
  training_type: finetuning
  hardware_used: 1 P100 16GB GPU
tags:
- text2code
- LoRA
- GPTQ
- Llama-2-7B-Chat
- text2python
- instruction2code
---

# Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K

Generate Python code that accomplishes the task instructed.


## LoRA Adpater Head

### Description

Parameter Efficient Finetuning(PEFT) a 4bit quantized Llama-2-7b-Chat from TheBloke/Llama-2-7b-Chat-GPTQ on flytech/python-codes-25k dataset.

- **Language(s) (NLP):** English
- **License:** openrail
- **Qunatization:** GPTQ 4bit
- **PEFT:** LoRA
- **Finetuned from model [TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)**
- **Dataset:** [flytech/python-codes-25k](https://huggingface.co/datasets/flytech/python-codes-25k)

## Intended uses & limitations

Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

### How to use

```
The quantized model is finetuned as PEFT. We have the trained Adapter.
Merging LoRA adapater with GPTQ quantized model is not yet supported.
So instead of loading a single finetuned model, we need to load the base
model and merge the finetuned adapter on top.
```

```python
instruction = """"Help me set up my daily to-do list!""""
```
```python
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")
model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code")
tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")

inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(code)
```

### Size Comparison

The table shows comparison VRAM requirements for loading and training
of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
from HuggingFace




| Model                   | Total Size  | Training Using Adam |
| ------------------------|-------------| --------------------| 
| Base Model              | 12.37 GB    | 49.48 GP            |
| 4bitQuantized+PEFT      | 3.90 GB     | 11 GB               |


## Training Details

### Training Data

****Dataset:****[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)

Trained on `instruction` column of 20,000 randomly shuffled data.

### Training Procedure

HuggingFace Accelerate with Training Loop.


#### Training Hyperparameters

- **Optimizer:** AdamW
- **lr:** 2e-5
- **decay:** linear
- **batch_size:** 4
- **gradient_accumulation_steps:** 8
- **global_step:** 625

 LoraConfig
 - ***r:*** 8
 - ***lora_alpha:*** 32
 - ***target_modules:***  ["k_proj","o_proj","q_proj","v_proj"]
 - ***lora_dropout:*** 0.05


#### Hardware

- **GPU:** P100


## Additional Information

- ***Github:*** [Repository]()
- ***Intro to quantization:*** [Blog](https://huggingface.co/blog/merve/quantization)
- ***Emergent Feature:*** [Academic](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features)
- ***GPTQ Paper:*** [GPTQ](https://arxiv.org/pdf/2210.17323)
- ***BITSANDBYTES and further*** [LLM.int8()](https://arxiv.org/pdf/2208.07339)

## Acknowledgment

Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
Thanks to [@HuggungFace Team](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing#scrollTo=vT0XjNc2jYKy) for the notebook on gptq.


## Model Card Authors

Swastik Maiti