Update README.md

82e6edd verified about 2 months ago

No virus

7.23 kB

	---
	library_name: peft
	base_model: TheBloke/Llama-2-7b-Chat-GPTQ
	pipeline_tag: text-generation
	inference: false
	license: openrail
	language:
	- en
	datasets:
	- flytech/python-codes-25k
	co2_eq_emissions:
	emissions: 1190
	source: >-
	Quantifying the Carbon Emissions of Machine Learning
	https://mlco2.github.io/impact#compute
	training_type: finetuning
	hardware_used: 1 P100 16GB GPU
	widget:
	- text: hello this is an example
	tags:
	- text2code
	- LoRA
	- GPTQ
	- Llama-2-7B-Chat
	- text2python
	- instruction2code
	- nl2code
	- python
	---

	# Llama-2-7b-Chat-GPTQ fine-tuned on PYTHON-CODES-25K

	Generate Python code that accomplishes the task instructed.


	## LoRA Adpater Head

	### Description

	Parameter Efficient Finetuning a 4bit quantized Llama-2-7b-Chat on flytech/python-codes-25k dataset.

	- Language(s) (NLP): English
	- License: openrail
	- Qunatization: GPTQ 4bit
	- PEFT: LoRA
	- Finetuned from model [TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ)
	- Dataset: [flytech/python-codes-25k](https://huggingface.co/datasets/flytech/python-codes-25k)

	## Intended uses & limitations

	Addressing the efficay of Quantization and PEFT. Implemented as a personal Project.

	### How to use

	```
	The quantized model is finetuned as PEFT. We have the trained Adapter.
	Merging LoRA adapater with GPTQ quantized model is not yet supported.
	So instead of loading a single finetuned model, we need to load the base
	model and merge the finetuned adapter on top.
	```

	```python
	instruction = """"Help me set up my daily to-do list!""""
	```
	```python
	from peft import PeftModel, PeftConfig
	from transformers import AutoModelForCausalLM,AutoTokenizer

	config = PeftConfig.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code") #PEFT Config
	model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GPTQ",device_map='auto') #Loading the Base Model
	model = PeftModel.from_pretrained(model, "SwastikM/Llama-2-7B-Chat-text2code") #Combining Trained Adapter with Base Model
	tokenizer = AutoTokenizer.from_pretrained("SwastikM/Llama-2-7B-Chat-text2code")

	inputs = tokenizer(instruction, return_tensors="pt").input_ids.to('cuda')
	outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
	code = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print(code)
	```

	### A Test Example
	```python
	User_Prompt = """Write a Python program to implement K-Means clustering. The program should take two mandatory arguments, k and data, where k is the number of clusters and data is a 2D array containing the data points k = 3
	data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]"""

	inputs = tokenizer(User_Prompt, return_tensors="pt").input_ids.to('cuda')
	outputs = model.generate(inputs, max_new_tokens=500, do_sample=False, num_beams=1)
	python_code = tokenizer.decode(outputs[0], skip_special_tokens=True)

	print("Generated Output:",python_code)
	>>>
	```
	``````
	Generated Output:Write a Python program to implement K-Means clustering. The program should take two mandatory arguments, k and data, where k is the number of clusters and data is a 2D array containing the data points k = 3
	data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]] Ready for action! Let's do this! ```python
	import numpy as np

	def kmeans(data, k):
	# Initialize the centroids
	centroids = np.random.rand(k, 2)

	# Initialize the cluster assignments
	cluster_assignments = np.zeros(data.shape[0], dtype=int)

	# Iterate through the data points
	for i in range(data.shape[0]):
	# Calculate the distance between the current data point and each of the centroids
	distances = np.linalg.norm(data[i] - centroids)

	# Assign the data point to the closest centroid
	cluster_assignments[i] = np.argmin(distances)

	return cluster_assignments
	```
	This program takes two mandatory arguments, `k` and `data`, where `k` is the number of clusters and `data` is a 2D array containing the data points. The program initializes the centroids randomly and then iterates through the data points to calculate the distance between each data point and each of the centroids. The program then assigns each data point to the closest centroid based on the calculated distance. Finally, the program returns the cluster assignments for each data point.

	Note that this program uses the Euclidean distance to calculate the distance between the data points and the centroids. You can change the distance metric if needed.

	Also, this program assumes that the data points are 2D. If the data points are 3D or higher, you will need to modify the program accordingly.

	I hope this helps! Let me know if you have any questions.

	```python
	# Example usage
	data = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
	k = 3
	cluster_assignments = kmeans(data, k)
	print(cluster_assignments)
	```
	This will output the cluster assignments for each data point. The output will be a list of integers, where each integer represents the cluster assignment for that data point. For example, if the data points are
	---------------------------------------------------------------------

	``````

	## Size Comparison

	The table shows comparison VRAM requirements for loading and training
	of FP16 Base Model and 4bit GPTQ quantized model with PEFT.
	The value for base model referenced from [Model Memory Calculator](https://huggingface.co/docs/accelerate/main/en/usage_guides/model_size_estimator)
	from HuggingFace




	\| Model \| Total Size \| Training Using Adam \|
	\| ------------------------\|-------------\| --------------------\|
	\| Base Model \| 12.37 GB \| 49.48 GB \|
	\| 4bitQuantized+PEFT \| 3.90 GB \| 11 GB \|


	## Training Details

	### Training Data

	**Dataset:**[gretelai/synthetic_text_to_sql](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql)

	Trained on `instruction` column of 20,000 randomly shuffled data.

	### Training Procedure

	HuggingFace Accelerate with Training Loop.


	#### Training Hyperparameters

	- Optimizer: AdamW
	- lr: 2e-5
	- decay: linear
	- batch_size: 4
	- gradient_accumulation_steps: 8
	- global_step: 625

	LoraConfig
	- *r:* 8
	- *lora_alpha:* 32
	- *target_modules:* ["k_proj","o_proj","q_proj","v_proj"]
	- *lora_dropout:* 0.05


	#### Hardware

	- GPU: P100


	## Additional Information

	- *Github:* [Repository](https://github.com/swastikmaiti/Llama-2-7B-Chat-PEFT.git)
	- *Intro to quantization:* [Blog](https://huggingface.co/blog/merve/quantization)
	- *Emergent Feature:* [Academic](https://timdettmers.com/2022/08/17/llm-int8-and-emergent-features)
	- *GPTQ Paper:* [GPTQ](https://arxiv.org/pdf/2210.17323)
	- *BITSANDBYTES and further* [LLM.int8()](https://arxiv.org/pdf/2208.07339)

	## Acknowledgment

	Thanks to [@AMerve Noyan](https://huggingface.co/blog/merve/quantization) for precise intro.
	Thanks to [@HuggungFace Team](https://huggingface.co/blog/gptq-integration#fine-tune-quantized-models-with-peft) for the [notebook](https://colab.research.google.com/drive/1_TIrmuKOFhuRRiTWN94iLKUFu6ZX4ceb?usp=sharing) on GPTQ.


	## Model Card Authors

	Swastik Maiti