AjayMukundS
/

Llama-2-7b-chat-finetune

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

AjayMukundS commited on May 21, 2024

Commit

a7337c4

•

1 Parent(s): eb89191

Update README.md

Files changed (1) hide show

README.md +9 -12

README.md CHANGED Viewed

@@ -6,18 +6,9 @@ language:
 - en
 metrics:
 - bleu
-library_name: adapter-transformers
 tags:
-- chemistry
-- biology
-- finance
-- legal
-- music
-- art
-- code
-- climate
-- medical
 - text-generation-inference
 ---
 # Deployed Model
@@ -27,9 +18,13 @@ AjayMukundS/Llama-2-7b-chat-finetune
 This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
 In the case of Llama 2, the following Chat Template is used for the chat models:
-**[INST] SYSTEM PROMPT**
-**User Prompt [/INST] Model Answer**
 System Prompt (optional) --> to guide the model
@@ -46,6 +41,8 @@ The Instruction Dataset is reformated to follow the above Llama 2 template.
 **Complete Reformated Datset** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2
 To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
 ## Process

 - en
 metrics:
 - bleu
 tags:
 - text-generation-inference
+pipeline_tag: text-generation
 ---
 # Deployed Model
 This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from **mlabonne/guanaco-llama2**. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion.
 In the case of Llama 2, the following Chat Template is used for the chat models:
+**(s)[INST] ((sys))**
+**SYSTEM PROMPT**
+**((/sys))**
+**User Prompt [/INST] Model Answer (/s)**
 System Prompt (optional) --> to guide the model
 **Complete Reformated Datset** --> https://huggingface.co/datasets/mlabonne/guanaco-llama2
+To know how this dataset was created, you can check this notebook --> https://colab.research.google.com/drive/1Ad7a9zMmkxuXTOh1Z7-rNSICA4dybpM2?usp=sharing
 To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was **L4 (Google Colab Pro)**
 ## Process