leaderboard-pr-bot's picture
Adding Evaluation Results
837cf16 verified
|
raw
history blame
4.83 kB
---
language:
- en
license: apache-2.0
tags:
- merge
- fine-tuned
datasets:
- cognitivecomputations/dolphin
- cognitivecomputations/dolphin-coder
- ise-uiuc/Magicoder-OSS-Instruct-75K
- teknium/openhermes
- migtissera/Synthia-v1.3
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
- ehartford/dolphin-2.2.1-mistral-7b
- SciPhi/SciPhi-Mistral-7B-32k
- ehartford/samantha-1.2-mistral-7b
- Arc53/docsgpt-7b-mistral
- HuggingFaceH4/zephyr-7b-beta
- meta-math/MetaMath-Mistral-7B
- Open-Orca/Mistral-7B-OpenOrca
- openchat/openchat-3.5-1210
- beowolx/MistralHermes-CodePro-7B-v1
- TIGER-Lab/MAmmoTH-7B-Mistral
- teknium/OpenHermes-2.5-Mistral-7B
- Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
- mlabonne/NeuralHermes-2.5-Mistral-7B
model-index:
- name: Mistral-7B-Merge-14-v0.3-ft-step-9984
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 62.54
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3-ft-step-9984
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 82.18
name: normalized accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3-ft-step-9984
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 62.92
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3-ft-step-9984
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 53.7
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3-ft-step-9984
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 75.61
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3-ft-step-9984
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 25.25
name: accuracy
source:
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3-ft-step-9984
name: Open LLM Leaderboard
---
# Model Description
This is fine-tuned model based on EmbeddedLLM/Mistral-7B-Merge-14-v0.3 for 9984 steps.
The dataset used are:
* dophin
* dolphin-coder
* Magicoder-OSS-Instruct-75K
* openhermes
* Synthia-v1.3
## Chat Template
Prompt format: This model uses ChatML prompt format.
```
<|im_start|>system
You are Dolphin, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```
# Training
The model is scheduled to be fine-tuned for 3 epochs on 4 A100s using axolotl.
# Shout-Out to OSS
Thank you to the Open Source AI community for bringing together marvelous code frameworks and datasets.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_EmbeddedLLM__Mistral-7B-Merge-14-v0.3-ft-step-9984)
| Metric |Value|
|---------------------------------|----:|
|Avg. |60.37|
|AI2 Reasoning Challenge (25-Shot)|62.54|
|HellaSwag (10-Shot) |82.18|
|MMLU (5-Shot) |62.92|
|TruthfulQA (0-shot) |53.70|
|Winogrande (5-shot) |75.61|
|GSM8k (5-shot) |25.25|