|
--- |
|
license: mit |
|
datasets: |
|
- iamplus/LLama2-SFT-Data |
|
- iamplus/Open_Platypus_Orca |
|
- iamplus/Orca |
|
- iamplus/Conversational_Data |
|
--- |
|
|
|
|
|
**Description :** |
|
|
|
This model is trained on a mix of Orca data and Open Source + Closed Multi-turn Conversation data to create a better reasoning model which is capable of holding multi-turn conversations as well. |
|
|
|
The Dataset split description, Prompt description as well as Training Parameters are given below. |
|
|
|
**Prompt Description :** |
|
|
|
The prompt template for the first turn looks like this: |
|
``` |
|
<s>[INST] <<SYS>> |
|
{{ system_prompt }} |
|
<</SYS>> |
|
|
|
{{ user_message }} [/INST] |
|
``` |
|
|
|
The prompt template for the multi-turn conversation looks like this: |
|
``` |
|
<s>[INST] <<SYS>> |
|
{{ system_prompt }} |
|
<</SYS>> |
|
|
|
{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST] |
|
``` |
|
|
|
This model follows the official Meta's chat model Prompt format. Please refer here : https://huggingface.co/blog/llama2#how-to-prompt-llama-2 on how to prompt the model for single/multi-turn conversations. |
|
|
|
**Base model :** meta-llama/Llama-2-70b-hf |
|
|
|
**Data :** |
|
1. 1M Orca dara (Gpt-4 Orca data - OpenOrca) |
|
2. 1.7M chat data (includes OpenAssistant Chat data, Ultrachat, and many more open source Chat Datasets) |
|
3. 30k OpenPlatypus data |
|
|
|
**Training Params :** |
|
``` |
|
Number of Epochs : 2 |
|
Batch Size : 64 |
|
Sequence Length : 4096 |
|
Learning Rate : 2e-5 (Cosine) |
|
Weight Decay : 0.1 |
|
Gradient Clipping : 1.0 |
|
Gamma : 0.85 |
|
beta_1 : 0.9 |
|
beta_2 : 0.95 |
|
eps : 1e-5 |
|
Precision : bf16 |
|
Optimizer : Any Precision AdamW Optimizer |
|
``` |