SwastikM's picture
Update README.md
edb13d1 verified
|
raw
history blame
No virus
4.22 kB
metadata
library_name: peft
base_model: TheBloke/Llama-2-7b-Chat-GPTQ
pipeline_tag: text-generation
inference: false
license: openrail
language:
  - en
datasets:
  - flytech/python-codes-25k
co2_eq_emissions:
  emissions: 1190
  source: >-
    Quantifying the Carbon Emissions of Machine Learning
    https://mlco2.github.io/impact#compute
  training_type: finetuning
  hardware_used: 1 P100 16GB GPU
tags:
  - text2code
  - LoRA
  - GPTQ
  - Llama-2-7B-Chat
  - text2python
  - instruction2code

Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K

Generate Python code that accomplishes the task instructed.

LoRA Adpater Head

Description

Parameter Efficient Finetuning(PEFT) a 4bit quantized Llama-2-7b-Chat from TheBloke/Llama-2-7b-Chat-GPTQ on flytech/python-codes-25k dataset.

Intended uses & limitations

Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

How to use

The quantized model is finetuned as PEFT. We have the trained Adapter.
Merging LoRA adapater with GPTQ quantized model is not yet supported.
So instead of loading a single finetuned model, we need to load the base
model and merge the finetuned adapter on top.
instruction = """"Help me set up my daily to-do list!""""
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")      #PEFT Config
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ")  #Loading the Base Model
model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code") #Combining Trained Adapter with Base Model
tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")

inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
code = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(code)

Size Comparison

The table shows comparison VRAM requirements for loading and training of FP16 Base Model and 4bit GPTQ quantized model with PEFT. The value for base model referenced from Model Memory Calculator from HuggingFace

Model Total Size Training Using Adam
Base Model 12.37 GB 49.48 GP
4bitQuantized+PEFT 3.90 GB 11 GB

Training Details

Training Data

Dataset:gretelai/synthetic_text_to_sql

Trained on instruction column of 20,000 randomly shuffled data.

Training Procedure

HuggingFace Accelerate with Training Loop.

Training Hyperparameters

  • Optimizer: AdamW
  • lr: 2e-5
  • decay: linear
  • batch_size: 4
  • gradient_accumulation_steps: 8
  • global_step: 625

LoraConfig

  • r: 8
  • lora_alpha: 32
  • target_modules: ["k_proj","o_proj","q_proj","v_proj"]
  • lora_dropout: 0.05

Hardware

  • GPU: P100

Additional Information

Acknowledgment

Thanks to @AMerve Noyan for precise intro. Thanks to @HuggungFace Team for the notebook on gptq.

Model Card Authors

Swastik Maiti