open-aditi-hi-v4 / README.md

Adding Evaluation Results (#2)

ed7d96e verified 10 months ago

7.59 kB

	---
	language:
	- hi
	- en
	license: apache-2.0
	base_model: teknium/OpenHermes-2.5
	model-index:
	- name: open-aditi-hi-v4
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 60.15
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 81.84
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 61.32
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 44.89
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 79.95
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v4
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 57.24
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=manishiitg/open-aditi-hi-v4
	name: Open LLM Leaderboard
	---


	Model trained on Hindi and English data.

	This model also includes dataset https://huggingface.co/datasets/sarvamai/samvaad-hi-v1

	Check latest evals at https://github.com/manishiitg/IndicEval


	Try it out: https://colab.research.google.com/drive/1A_hbsq1vrCeAh3dEMvtwxxNxcNZ1BUyW?usp=sharing

	For sample responose on different prompts checkout: https://github.com/manishiitg/hi-llm-eval


	#### Language Hi

	\| Model \| xlsum-hi \| truthfulqa-hi \| indic-arc-easy \| mmlu_hi \| indicqa \| flores \| indicheadline \| indicxparaphrase \| hellaswag-indic \| indicwikibio \| boolq-hi \| implicit_hate \| indic-arc-challenge \| indicsentiment \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| open-aditi-hi-v2 \| 0.4213 \| 0.6934 \| 0.4979 \| 0.3253 \| 0.0795 \| 43.6822 \| 0.4565 \| 0.6838 \| 0.2404 \| 0.4846 \| 0.8541 \| 11.5021 \| 0.4462 \| 0.9729 \|
	\| open-aditi-hi-v3 \| 0.4490 \| 0.5369 \| 0.5480 \| 0.1351 \| 0.0058 \| 48.2859 \| 0.4682 \| 0.8846 \| 0.4891 \| 0.5034 \| 0.5401 \| 8.8315 \| 0.4633 \| 0.9519 \|
	\| open-aditi-hi-v4 \| 0.4046 \| 0.7671 \| 0.4529 \| 0.2124 \| 0.0026 \| 47.8500 \| 0.1980 \| 0.7737 \| 0.3595 \| 0.4894 \| 0.7015 \| 5.9709 \| 0.3857 \| 0.9699 \|
	\| OpenHermes-2.5-Mistral-7B \| 0.1774 \| 0.3234 \| 0.3523 \| 0.2769 \| 0.2721 \| 30.3465 \| 0.1996 \| 0.8766 \| 0.2485 \| 0.3332 \| 0.5979 \| 0.2068 \| 0.3396 \| 0.9048 \|
	\| OpenHermes-2.5-Mistral-7B-AWQ \| 0.1894 \| 0.3428 \| 0.3291 \| 0.2750 \| 0.3116 \| 29.3681 \| 0.2062 \| 0.8536 \| 0.2479 \| 0.3067 \| 0.5272 \| 6.0594 \| 0.3157 \| 0.9218 \|
	\| open-aditi-hi-v1 \| 0.4212 \| 0.4230 \| 0.3889 \| 0.1398 \| 0.1306 \| 40.2376 \| 0.4248 \| 0.5939 \| 0.0848 \| 0.4104 \| 0.3758 \| 8.6105 \| 0.3558 \| 0.8798 \|
	\| Airavata \| 0.4650 \| 0.0466 \| 0.1128 \| 0.1336 \| 0.0155 \| 58.5260 \| 0.4346 \| 0.6419 \| 0.0550 \| 0.0637 \| 0.0128 \| 6.3612 \| 0.0836 \| 0.0992 \|

	#### Language En

	\| Model \| boolq \| truthfulqa \| arc-easy-exact \| mmlu \| hellaswag \| xlsum \| arc-challenge \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| open-aditi-hi-v4 \| 0.3905 \| 0.3378 \| 0.8460 \| 0.5725 \| 0.7603 \| 0.4384 \| 0.7491 \|
	\| OpenHermes-2.5-Mistral-7B \| 0.4061 \| 0.2081 \| 0.8687 \| 0.5991 \| 0.7999 \| 0.4328 \| 0.7790 \|
	\| OpenHermes-2.5-Mistral-7B-AWQ \| 0.4199 \| 0.1897 \| 0.8569 \| 0.5816 \| 0.7826 \| 0.4317 \| 0.7611 \|
	\| open-aditi-hi-v3 \| 0.3749 \| 0.3097 \| 0.8384 \| 0.5478 \| 0.7645 \| 0.4352 \| 0.7415 \|
	\| open-aditi-hi-v2 \| 0.3982 \| 0.2999 \| 0.8388 \| 0.5544 \| 0.4738 \| 0.4349 \| 0.7235 \|
	\| open-aditi-hi-v1 \| 0.0434 \| 0.3317 \| 0.7588 \| 0.2597 \| 0.3509 \| 0.4288 \| 0.6271 \|
	\| Airavata \| 0.5086 \| 0.3574 \| 0.6772 \| 0.1165 \| 0.1799 \| 0.4393 \| 0.1630 \|

	Task: flores Metric: chrf

	Task: implicit_hate Metric: chrf

	Task: indicsentiment Metric: accuracy

	Task: indicxparaphrase Metric: accuracy

	Task: boolq-hi Metric: accuracy

	Task: truthfulqa-hi Metric: accuracy

	Task: indic-arc-easy Metric: accuracy

	Task: indicwikibio Metric: bleurt

	Task: hellaswag-indic Metric: accuracy

	Task: indicheadline Metric: bleurt

	Task: xlsum-hi Metric: bleurt

	Task: indic-arc-challenge Metric: accuracy

	Task: mmlu_hi Metric: average_acc

	Task: indicqa Metric: accuracy

	Task: arc-easy-exact Metric: accuracy

	Task: hellaswag Metric: accuracy

	Task: arc-challenge Metric: accuracy

	Task: mmlu Metric: average_acc

	Task: boolq Metric: accuracy

	Task: xlsum Metric: bleurt

	Task: truthfulqa Metric: accuracy




	Model evaluation on OpenLLM LeaderBoard

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/5dfae476da6d0311fd3d5432/ENzZwV2Z98uNlpyUz3Blp.png)

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/5dfae476da6d0311fd3d5432/SpSiu5lzA6JKJx8ICX_zd.png)




	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_manishiitg__open-aditi-hi-v4)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|64.23\|
	\|AI2 Reasoning Challenge (25-Shot)\|60.15\|
	\|HellaSwag (10-Shot) \|81.84\|
	\|MMLU (5-Shot) \|61.32\|
	\|TruthfulQA (0-shot) \|44.89\|
	\|Winogrande (5-shot) \|79.95\|
	\|GSM8k (5-shot) \|57.24\|


	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_manishiitg__open-aditi-hi-v4)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|64.23\|
	\|AI2 Reasoning Challenge (25-Shot)\|60.15\|
	\|HellaSwag (10-Shot) \|81.84\|
	\|MMLU (5-Shot) \|61.32\|
	\|TruthfulQA (0-shot) \|44.89\|
	\|Winogrande (5-shot) \|79.95\|
	\|GSM8k (5-shot) \|57.24\|