Adding Evaluation Results (#3)

9b6ada7 verified 3 months ago

5.26 kB

	---
	language:
	- en
	license: apache-2.0
	datasets:
	- 0-hero/Matter-0.2-alpha
	model-index:
	- name: Matter-0.2-7B-DPO
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 33.03
	name: strict accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=0-hero/Matter-0.2-7B-DPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 10.06
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=0-hero/Matter-0.2-7B-DPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 0.83
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=0-hero/Matter-0.2-7B-DPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 1.23
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=0-hero/Matter-0.2-7B-DPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 5.87
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=0-hero/Matter-0.2-7B-DPO
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 1.82
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=0-hero/Matter-0.2-7B-DPO
	name: Open LLM Leaderboard
	---

	## Matter 7B - 0.2 - DPO (Mistral 7B Finetune)

	DPO version of [Matter 7B](https://huggingface.co/0-hero/Matter-0.2-7B) fine-tuned on the [Matter dataset](https://huggingface.co/datasets/0-hero/Matter-0.2-alpha), which is curated from over 35 datsets analyzing >6B tokens


	### Training

	Prompt format: This model uses ChatML prompt format.
	```
	<\|im_start\|>system
	You are a helpful AI assistant.<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant
	```
	### Function Calling

	Model also supports function calling. Additional tokens for function calling

	Model function call tokens
	- <\|begin_func\|> - Function call start token
	- <\|end_func\|> - Function call end token

	Function call response tokens
	- <\|begin_func_response\|> - Function response start token
	- <\|end_func_response\|> - Function response end token

	Example
	```
	<\|im_start\|>system
	You are a helpful assistant with access to the following functions. Use them if required -
	{ "name": "get_news_headlines",
	"description": "Get the latest news headlines",
	"parameters":
	{ "type": "object",
	"properties":
	{ "country":
	{ "type": "string",
	"description": "The country for which to fetch news"
	}
	},
	"required": [ "country" ]
	}
	}
	<\|im_end\|>
	<\|im_start\|>user
	Can you tell me the latest news headlines for the United States?<\|im_end\|>
	<\|im_start\|>assistant
	<\|begin_func\|>{"name": "get_news_headlines", "arguments": '{"country": "United States"}'}<\|end_func\|><\|im_end\|>
	<\|im_start\|>user
	<\|begin_func_response\|>{
	"headlines":
	[
	"Biden announces new vaccine mandates",
	"Hurricane Ida devastates Louisiana",
	"Apple unveils new iPhone",
	"NASA's Perseverance rover collects first Mars rock sample"
	]
	}<\|end_func_response\|>
	<\|im_end\|>
	<\|im_start\|>assistant
	Here are the latest news headlines for the United States:
	1. Biden announces new vaccine mandates
	2. Hurricane Ida devastates Louisiana
	3. Apple unveils new iPhone
	4. NASA's Perseverance rover collects first Mars rock sample
	<\|im_end\|>
	```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_0-hero__Matter-0.2-7B-DPO)

	\| Metric \|Value\|
	\|-------------------\|----:\|
	\|Avg. \| 8.81\|
	\|IFEval (0-Shot) \|33.03\|
	\|BBH (3-Shot) \|10.06\|
	\|MATH Lvl 5 (4-Shot)\| 0.83\|
	\|GPQA (0-shot) \| 1.23\|
	\|MuSR (0-shot) \| 5.87\|
	\|MMLU-PRO (5-shot) \| 1.82\|