AjayMukundS
commited on
Commit
•
a7337c4
1
Parent(s):
eb89191
Update README.md
Browse files
README.md
CHANGED
@@ -6,18 +6,9 @@ language:
|
|
6 |
- en
|
7 |
metrics:
|
8 |
- bleu
|
9 |
-
library_name: adapter-transformers
|
10 |
tags:
|
11 |
-
- chemistry
|
12 |
-
- biology
|
13 |
-
- finance
|
14 |
-
- legal
|
15 |
-
- music
|
16 |
-
- art
|
17 |
-
- code
|
18 |
-
- climate
|
19 |
-
- medical
|
20 |
- text-generation-inference
|
|
|
21 |
---
|
22 |
|
23 |
# Deployed Model
|
@@ -27,9 +18,13 @@ AjayMukundS/Llama-2-7b-chat-finetune
|
|
27 |
This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
|
28 |
In the case of Llama 2, the following Chat Template is used for the chat models:
|
29 |
|
30 |
-
**[INST]
|
31 |
|
32 |
-
**
|
|
|
|
|
|
|
|
|
33 |
|
34 |
System Prompt (optional) --> to guide the model
|
35 |
|
@@ -46,6 +41,8 @@ The Instruction Dataset is reformated to follow the above Llama 2 template.
|
|
46 |
|
47 |
**Complete Reformated Datset** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2
|
48 |
|
|
|
|
|
49 |
To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
|
50 |
|
51 |
## Process
|
|
|
6 |
- en
|
7 |
metrics:
|
8 |
- bleu
|
|
|
9 |
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
- text-generation-inference
|
11 |
+
pipeline_tag: text-generation
|
12 |
---
|
13 |
|
14 |
# Deployed Model
|
|
|
18 |
This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
|
19 |
In the case of Llama 2, the following Chat Template is used for the chat models:
|
20 |
|
21 |
+
**(s)[INST] ((sys))**
|
22 |
|
23 |
+
**SYSTEM PROMPT**
|
24 |
+
|
25 |
+
**((/sys))**
|
26 |
+
|
27 |
+
**User Prompt [/INST] Model Answer (/s)**
|
28 |
|
29 |
System Prompt (optional) --> to guide the model
|
30 |
|
|
|
41 |
|
42 |
**Complete Reformated Datset** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2
|
43 |
|
44 |
+
To know how this dataset was created, you can check this notebook --> https://colab.research.google.com/drive/1Ad7a9zMmkxuXTOh1Z7-rNSICA4dybpM2?usp=sharing
|
45 |
+
|
46 |
To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
|
47 |
|
48 |
## Process
|