AjayMukundS
commited on
Commit
•
cddb4a4
1
Parent(s):
a7337c4
Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ pipeline_tag: text-generation
|
|
14 |
# Deployed Model
|
15 |
AjayMukundS/Llama-2-7b-chat-finetune
|
16 |
|
17 |
-
|
18 |
This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
|
19 |
In the case of Llama 2, the following Chat Template is used for the chat models:
|
20 |
|
@@ -32,7 +32,7 @@ User prompt (required) --> to give the instruction / User Query
|
|
32 |
|
33 |
Model Answer (required)
|
34 |
|
35 |
-
|
36 |
The Instruction Dataset is reformated to follow the above Llama 2 template.
|
37 |
|
38 |
**Original Dataset** --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\
|
@@ -45,7 +45,7 @@ To know how this dataset was created, you can check this notebook --> https://co
|
|
45 |
|
46 |
To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
|
47 |
|
48 |
-
|
49 |
1) Load the dataset as defined.
|
50 |
2) Configure bitsandbytes for 4-bit quantization.
|
51 |
3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
|
|
|
14 |
# Deployed Model
|
15 |
AjayMukundS/Llama-2-7b-chat-finetune
|
16 |
|
17 |
+
# Model Description
|
18 |
This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
|
19 |
In the case of Llama 2, the following Chat Template is used for the chat models:
|
20 |
|
|
|
32 |
|
33 |
Model Answer (required)
|
34 |
|
35 |
+
# Training Data
|
36 |
The Instruction Dataset is reformated to follow the above Llama 2 template.
|
37 |
|
38 |
**Original Dataset** --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\
|
|
|
45 |
|
46 |
To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
|
47 |
|
48 |
+
# Process
|
49 |
1) Load the dataset as defined.
|
50 |
2) Configure bitsandbytes for 4-bit quantization.
|
51 |
3) Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
|