Edit model card

Description :

This model is trained on a mix of Orca data and Open Source + Closed Multi-turn Conversation data to create a better reasoning model which is capable of holding multi-turn conversations as well.

The Dataset split description, Prompt description as well as Training Parameters are given below.

Prompt Description :

The prompt template for the first turn looks like this:

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_message }} [/INST]

The prompt template for the multi-turn conversation looks like this:

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s><s>[INST] {{ user_msg_2 }} [/INST]

This model follows the official Meta's chat model Prompt format. Please refer here : https://huggingface.co/blog/llama2#how-to-prompt-llama-2 on how to prompt the model for single/multi-turn conversations.

Base model : meta-llama/Llama-2-70b-hf

Data :

  1. 1M Orca dara (Gpt-4 Orca data - OpenOrca)
  2. 1.7M chat data (includes OpenAssistant Chat data, Ultrachat, and many more open source Chat Datasets)
  3. 30k OpenPlatypus data

Training Params :

Number of Epochs : 2
Batch Size : 64
Sequence Length : 4096
Learning Rate : 2e-5 (Cosine)
Weight Decay : 0.1
Gradient Clipping : 1.0
Gamma : 0.85
beta_1 : 0.9
beta_2 : 0.95
eps : 1e-5
Precision : bf16
Optimizer : Any Precision AdamW Optimizer
Downloads last month
20
Safetensors
Model size
69B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train iamplus/Llama-2-70b-hf-ChatOrca-v2