sethuiyer
/

Qwen2.5-7B-Anvita

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Qwen2.5-7B-Anvita / README.md

sethuiyer's picture

Update README.md

64c9969 verified 3 months ago

|

3.5 kB

	---
	base_model:
	- Qwen/Qwen2.5-7B-Instruct
	library_name: transformers
	tags:
	- reasoning
	- qwen
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	model-index:
	- name: Qwen2.5-7B-Anvita
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: HuggingFaceH4/ifeval
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 64.33
	name: strict accuracy
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: BBH
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 35.48
	name: normalized accuracy
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: hendrycks/competition_math
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 15.86
	name: exact match
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 10.29
	name: acc_norm
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 13.47
	name: acc_norm
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 35.17
	name: accuracy
	source:
	url: >-
	https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Qwen2.5-7B-Anvita
	name: Open LLM Leaderboard
	---


	## Evaluation Results
	\| Metric \| Value \|
	\|-------------------------\|--------------:\|
	\| Avg. \| 29.18 \|
	\| IFEval (0-Shot) \| 64.8 \|
	\| BBH (3-Shot) \| 35.48 \|
	\| MATH Level 5 (4-Shot)\| 15.86 \|
	\| GPQA (0-Shot) \| 10.29 \|
	\| MuSR (0-Shot) \| 13.47 \|
	\| MMLU-PRO (5-Shot) \| 35.17 \|

	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/sethuiyer/Qwen2.5-7B-Anvita/results_2024-10-27T11-40-06.834908.json).
	Personal Benchmarks - check [PERSONAL_BENCHMARK.md](./PERSONAL_BENCHMARK.md)