RA_Reasoner2.0 / README.md
Daemontatox's picture
Adding Evaluation Results (#2)
a40cf3e verified
---
base_model: Daemontatox/RA_Reasoner
license: apache-2.0
datasets:
- Daemontatox/Deepthinking-COT
language:
- en
new_version: Daemontatox/RA_Reasoner2.0
library_name: transformers
tags:
- COT
- Reasoning
- text-generation-inference
pipeline_tag: text-generation
model-index:
- name: RA_Reasoner2.0
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: wis-k/instruction-following-eval
split: train
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 53.66
name: averaged accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: SaylorTwift/bbh
split: test
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 43.07
name: normalized accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: lighteval/MATH-Hard
split: test
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 22.89
name: exact match
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
split: train
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 9.96
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 7.18
name: acc_norm
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 37.26
name: accuracy
source:
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
name: Open LLM Leaderboard
---
![RA_REASONER](./image.webp)
# **RA_Reasoner 2.0**
## **Model Details**
**Developed by:** [Daemontatox](#)
**License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
**Base Model:** Daemontatox/RA_Reasoner
This model is fine-tuned from the Falcon-10B-Instruct model, leveraging advanced training optimizations to enhance reasoning and instruction-following capabilities. It was trained 2x faster using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.
---
## **Training Details**
- **Frameworks Used:** Unsloth, Hugging Face TRL
- **Fine-Tuning Focus:** Emphasis on reasoning, logic-based tasks, and instruction comprehension.
- **Dataset:** Includes examples from [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT).
- **Optimization:** Significant speedup during fine-tuning while maintaining model quality.
Further details on hyperparameters and fine-tuning methodology will be added in future updates.
---
## **Intended Use**
This model is intended for **research and development** in text generation, reasoning tasks, and instruction-following applications.
### **Key Features:**
- Enhanced reasoning capabilities for multi-step logical problems.
- Robust instruction-following for complex tasks.
- Fine-tuned for Chain-of-Thought (COT) reasoning and inference.
### **Applications:**
- Research on reasoning-based AI systems.
- Tasks requiring logical deductions, such as question answering and problem-solving.
- General text generation with a focus on nuanced understanding.
---
## **Limitations and Warnings**
- This model is not designed for real-time or production-critical tasks.
- Outputs may vary based on input specificity and complexity.
- Users are responsible for ensuring ethical use and compliance with applicable regulations.
---
## **Acknowledgments**
- Base model: Daemontatox/RA_Reasoner
- Training acceleration powered by [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.
- Dataset contributions: [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT).
---# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__RA_Reasoner2.0-details)!
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FRA_Reasoner2.0&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
| Metric |Value (%)|
|-------------------|--------:|
|**Average** | 29.00|
|IFEval (0-Shot) | 53.66|
|BBH (3-Shot) | 43.07|
|MATH Lvl 5 (4-Shot)| 22.89|
|GPQA (0-shot) | 9.96|
|MuSR (0-shot) | 7.18|
|MMLU-PRO (5-shot) | 37.26|