bhenrym14
/

mistral-7b-platypus-fp16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mistral-7b-platypus-fp16 / README.md

leaderboard-pr-bot's picture

leaderboard-pr-bot

Adding Evaluation Results

565aa51 about 1 year ago

|

2.6 kB

	---
	datasets:
	- garage-bAInd/Open-Platypus
	---

	# Instruction tune of Mistral-7B-v0.1 with Open-Platypus (fp16)


	## Overview

	This is [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1), with instruction tuning performed with the [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) dataset.

	This is a (merged) QLoRA fine-tune (rank 64).

	The finetune was performed with 1x RTX 6000 Ada (~9 hours).


	## How to Use

	As of writing, the `Mistral` architecture requires installation of `transformers` from source. With this done, load like any other model.

	### Benchmarks

	ARC (25 shot): 62.80

	Hellaswag (10 shot): 84.12

	MMLU (5 shot): 64.20


	## Context Length - Relative Performance (wikitext perplexity)

	\| Context (tokens) \| <ins>bhenrym14/mistral-7b-platypus-fp16</ins> \| bhenrym14/airoboros-l2-13b-2.1-YaRN-64k \| bhenrym14/airophin-13b-pntk-16k-fp16 \| bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16 \| jondurbin/airoboros-l2-13b-gpt4-1.4.1 \|
	\| --- \| --- \|--- \| ---\| ----- \| -----\|
	\| 512 \| 7.22 \| 7.64 \| 7.62 \| 7.90 \| 7.23 \|
	\| 1024 \| 6.04 \| 6.15 \| 6.20 \| 6.17 \| 5.85 \|
	\| 2048 \| 5.50 \| 5.29 \| 5.38 \| 5.23 \| 5.07 \|
	\| 4096 \| 5.05 \|4.93 \| 5.08 \| 4.91 \| 4.77 \|
	\| 8192 \| 4.96 \|4.69 \| 4.90 \| Not Tested \| 57.1 \|
	\| 12000 \| Not Tested \| 4.53 \| 4.82 \| Not Tested \| Not Tested \|

	- While the mistral model is very impressive for its size, particularly on benchmarks, the sliding window attention and/or model size impacts its competitiveness with other context extension techniques applied to larger llama2 and llama variants. Is this is more to do with sliding window attention or model size?

	## Prompting:

	Model was trained with legacy airoboros <2.0 system prompt. See [bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16](https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-fp16) model card for details.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_bhenrym14__mistral-7b-platypus-fp16)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 56.89 \|
	\| ARC (25-shot) \| 63.05 \|
	\| HellaSwag (10-shot) \| 84.15 \|
	\| MMLU (5-shot) \| 64.11 \|
	\| TruthfulQA (0-shot) \| 45.07 \|
	\| Winogrande (5-shot) \| 78.53 \|
	\| GSM8K (5-shot) \| 17.36 \|
	\| DROP (3-shot) \| 45.92 \|