RA_Reasoner2.0 / README.md

Adding Evaluation Results (#2)

a40cf3e verified 6 days ago

5.91 kB

	---
	base_model: Daemontatox/RA_Reasoner
	license: apache-2.0
	datasets:
	- Daemontatox/Deepthinking-COT
	language:
	- en
	new_version: Daemontatox/RA_Reasoner2.0
	library_name: transformers
	tags:
	- COT
	- Reasoning
	- text-generation-inference
	pipeline_tag: text-generation
	model-index:
	- name: RA_Reasoner2.0
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: IFEval (0-Shot)
	type: wis-k/instruction-following-eval
	split: train
	args:
	num_few_shot: 0
	metrics:
	- type: inst_level_strict_acc and prompt_level_strict_acc
	value: 53.66
	name: averaged accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BBH (3-Shot)
	type: SaylorTwift/bbh
	split: test
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 43.07
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MATH Lvl 5 (4-Shot)
	type: lighteval/MATH-Hard
	split: test
	args:
	num_few_shot: 4
	metrics:
	- type: exact_match
	value: 22.89
	name: exact match
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GPQA (0-shot)
	type: Idavidrein/gpqa
	split: train
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 9.96
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MuSR (0-shot)
	type: TAUR-Lab/MuSR
	args:
	num_few_shot: 0
	metrics:
	- type: acc_norm
	value: 7.18
	name: acc_norm
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU-PRO (5-shot)
	type: TIGER-Lab/MMLU-Pro
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 37.26
	name: accuracy
	source:
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
	name: Open LLM Leaderboard
	---

	![RA_REASONER](./image.webp)

	# RA_Reasoner 2.0

	## Model Details

	Developed by: [Daemontatox](#)
	License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
	Base Model: Daemontatox/RA_Reasoner

	This model is fine-tuned from the Falcon-10B-Instruct model, leveraging advanced training optimizations to enhance reasoning and instruction-following capabilities. It was trained 2x faster using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.

	---

	## Training Details

	- Frameworks Used: Unsloth, Hugging Face TRL
	- Fine-Tuning Focus: Emphasis on reasoning, logic-based tasks, and instruction comprehension.
	- Dataset: Includes examples from [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT).
	- Optimization: Significant speedup during fine-tuning while maintaining model quality.

	Further details on hyperparameters and fine-tuning methodology will be added in future updates.

	---

	## Intended Use

	This model is intended for research and development in text generation, reasoning tasks, and instruction-following applications.

	### Key Features:
	- Enhanced reasoning capabilities for multi-step logical problems.
	- Robust instruction-following for complex tasks.
	- Fine-tuned for Chain-of-Thought (COT) reasoning and inference.

	### Applications:
	- Research on reasoning-based AI systems.
	- Tasks requiring logical deductions, such as question answering and problem-solving.
	- General text generation with a focus on nuanced understanding.

	---

	## Limitations and Warnings

	- This model is not designed for real-time or production-critical tasks.
	- Outputs may vary based on input specificity and complexity.
	- Users are responsible for ensuring ethical use and compliance with applicable regulations.

	---

	## Acknowledgments

	- Base model: Daemontatox/RA_Reasoner
	- Training acceleration powered by [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.
	- Dataset contributions: [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT).

	---# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__RA_Reasoner2.0-details)!
	Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FRA_Reasoner2.0&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!

	\| Metric \|Value (%)\|
	\|-------------------\|--------:\|
	\|Average \| 29.00\|
	\|IFEval (0-Shot) \| 53.66\|
	\|BBH (3-Shot) \| 43.07\|
	\|MATH Lvl 5 (4-Shot)\| 22.89\|
	\|GPQA (0-shot) \| 9.96\|
	\|MuSR (0-shot) \| 7.18\|
	\|MMLU-PRO (5-shot) \| 37.26\|