|
--- |
|
datasets: |
|
- garage-bAInd/Open-Platypus |
|
--- |
|
|
|
# Instruction tune of Mistral-7B-v0.1 with Open-Platypus (fp16) |
|
|
|
|
|
## Overview |
|
|
|
This is [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), with instruction tuning performed with the [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) dataset. |
|
|
|
**This is a (merged) QLoRA fine-tune (rank 64)**. |
|
|
|
The finetune was performed with 1x RTX 6000 Ada (~9 hours). |
|
|
|
|
|
## How to Use |
|
|
|
As of writing, the `Mistral` architecture requires installation of `transformers` from source. With this done, load like any other model. |
|
|
|
### Benchmarks |
|
|
|
ARC (25 shot): 62.80 |
|
|
|
Hellaswag (10 shot): 84.12 |
|
|
|
MMLU (5 shot): 64.20 |
|
|
|
|
|
## Context Length - Relative Performance (wikitext perplexity) |
|
|
|
| Context (tokens) | <ins>**bhenrym14/mistral-7b-platypus-fp16**</ins> | bhenrym14/airoboros-l2-13b-2.1-YaRN-64k | bhenrym14/airophin-13b-pntk-16k-fp16 | bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 | jondurbin/airoboros-l2-13b-gpt4-1.4.1 | |
|
| --- | --- |--- | ---| ----- | -----| |
|
| 512 | **7.22** | 7.64 | 7.62 | 7.90 | 7.23 | |
|
| 1024 | 6.04 | 6.15 | 6.20 | 6.17 | **5.85** | |
|
| 2048 | 5.50 | 5.29 | 5.38 | 5.23 | **5.07** | |
|
| 4096 | 5.05 |4.93 | 5.08 | 4.91 | **4.77** | |
|
| 8192 | 4.96 |**4.69** | 4.90 | Not Tested | 57.1 | |
|
| 12000 | Not Tested | **4.53** | 4.82 | Not Tested | Not Tested | |
|
|
|
- While the mistral model is very impressive for its size, particularly on benchmarks, the sliding window attention and/or model size impacts its competitiveness with other context extension techniques applied to larger llama2 and llama variants. Is this is more to do with sliding window attention or model size? |
|
|
|
## Prompting: |
|
|
|
Model was trained with legacy airoboros <2.0 system prompt. See [bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16](https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16) model card for details. |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_bhenrym14__mistral-7b-platypus-fp16) |
|
|
|
| Metric | Value | |
|
|-----------------------|---------------------------| |
|
| Avg. | 56.89 | |
|
| ARC (25-shot) | 63.05 | |
|
| HellaSwag (10-shot) | 84.15 | |
|
| MMLU (5-shot) | 64.11 | |
|
| TruthfulQA (0-shot) | 45.07 | |
|
| Winogrande (5-shot) | 78.53 | |
|
| GSM8K (5-shot) | 17.36 | |
|
| DROP (3-shot) | 45.92 | |
|
|