AjayMukundS
/

Llama-2-7b-chat-finetune

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

AjayMukundS commited on May 21

Commit

cddb4a4

•

1 Parent(s): a7337c4

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ pipeline_tag: text-generation
 # Deployed Model
 AjayMukundS/Llama-2-7b-chat-finetune
-## Model Description
 This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
 In the case of Llama 2, the following Chat Template is used for the chat models:
@@ -32,7 +32,7 @@ User prompt (required) --> to give the instruction / User Query
 Model Answer (required)
-## Training Data
 The Instruction Dataset is reformated to follow the above Llama 2 template.
 **Original Dataset** --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\
@@ -45,7 +45,7 @@ To know how this dataset was created, you can check this notebook --> https://co
 To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
-## Process
 1) Load the dataset as defined.
 2) Configure bitsandbytes for 4-bit quantization.
 3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.

 # Deployed Model
 AjayMukundS/Llama-2-7b-chat-finetune
+# Model Description
 This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
 In the case of Llama 2, the following Chat Template is used for the chat models:
 Model Answer (required)
+# Training Data
 The Instruction Dataset is reformated to follow the above Llama 2 template.
 **Original Dataset** --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\
 To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
+# Process
 1) Load the dataset as defined.
 2) Configure bitsandbytes for 4-bit quantization.
 3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.