AjayMukundS
/

Llama-2-7b-chat-finetune

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-2-7b-chat-finetune / README.md

AjayMukundS's picture

Update README.md

a7337c4 verified 7 months ago

|

1.91 kB

	---
	license: mit
	datasets:
	- mlabonne/guanaco-llama2-1k
	language:
	- en
	metrics:
	- bleu
	tags:
	- text-generation-inference
	pipeline_tag: text-generation
	---

	# Deployed Model
	AjayMukundS/Llama-2-7b-chat-finetune

	## Model Description
	This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from mlabonne/guanaco-llama2. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
	In the case of Llama 2, the following Chat Template is used for the chat models:

	(s)[INST] ((sys))

	SYSTEM PROMPT

	((/sys))

	User Prompt [/INST] Model Answer (/s)

	System Prompt (optional) --> to guide the model

	User prompt (required) --> to give the instruction / User Query

	Model Answer (required)

	## Training Data
	The Instruction Dataset is reformated to follow the above Llama 2 template.

	Original Dataset --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\

	Reformated Dataset with 1K Samples --> https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k

	Complete Reformated Datset --> https://huggingface.co/datasets/mlabonne/guanaco-llama2

	To know how this dataset was created, you can check this notebook --> https://colab.research.google.com/drive/1Ad7a9zMmkxuXTOh1Z7-rNSICA4dybpM2?usp=sharing

	To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was L4 (Google Colab Pro)

	## Process
	1) Load the dataset as defined.
	2) Configure bitsandbytes for 4-bit quantization.
	3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
	4) Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer.
	5) Fine Tuning Starts...