|
--- |
|
base_model: Daemontatox/RA_Reasoner |
|
license: apache-2.0 |
|
datasets: |
|
- Daemontatox/Deepthinking-COT |
|
language: |
|
- en |
|
new_version: Daemontatox/RA_Reasoner2.0 |
|
library_name: transformers |
|
tags: |
|
- COT |
|
- Reasoning |
|
- text-generation-inference |
|
pipeline_tag: text-generation |
|
model-index: |
|
- name: RA_Reasoner2.0 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: wis-k/instruction-following-eval |
|
split: train |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 53.66 |
|
name: averaged accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: SaylorTwift/bbh |
|
split: test |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 43.07 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: lighteval/MATH-Hard |
|
split: test |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 22.89 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
split: train |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 9.96 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 7.18 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 37.26 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
![RA_REASONER](./image.webp) |
|
|
|
# **RA_Reasoner 2.0** |
|
|
|
## **Model Details** |
|
|
|
**Developed by:** [Daemontatox](#) |
|
**License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
|
**Base Model:** Daemontatox/RA_Reasoner |
|
|
|
This model is fine-tuned from the Falcon-10B-Instruct model, leveraging advanced training optimizations to enhance reasoning and instruction-following capabilities. It was trained 2x faster using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library. |
|
|
|
--- |
|
|
|
## **Training Details** |
|
|
|
- **Frameworks Used:** Unsloth, Hugging Face TRL |
|
- **Fine-Tuning Focus:** Emphasis on reasoning, logic-based tasks, and instruction comprehension. |
|
- **Dataset:** Includes examples from [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT). |
|
- **Optimization:** Significant speedup during fine-tuning while maintaining model quality. |
|
|
|
Further details on hyperparameters and fine-tuning methodology will be added in future updates. |
|
|
|
--- |
|
|
|
## **Intended Use** |
|
|
|
This model is intended for **research and development** in text generation, reasoning tasks, and instruction-following applications. |
|
|
|
### **Key Features:** |
|
- Enhanced reasoning capabilities for multi-step logical problems. |
|
- Robust instruction-following for complex tasks. |
|
- Fine-tuned for Chain-of-Thought (COT) reasoning and inference. |
|
|
|
### **Applications:** |
|
- Research on reasoning-based AI systems. |
|
- Tasks requiring logical deductions, such as question answering and problem-solving. |
|
- General text generation with a focus on nuanced understanding. |
|
|
|
--- |
|
|
|
## **Limitations and Warnings** |
|
|
|
- This model is not designed for real-time or production-critical tasks. |
|
- Outputs may vary based on input specificity and complexity. |
|
- Users are responsible for ensuring ethical use and compliance with applicable regulations. |
|
|
|
--- |
|
|
|
## **Acknowledgments** |
|
|
|
- Base model: Daemontatox/RA_Reasoner |
|
- Training acceleration powered by [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library. |
|
- Dataset contributions: [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT). |
|
|
|
---# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__RA_Reasoner2.0-details)! |
|
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FRA_Reasoner2.0&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)! |
|
|
|
| Metric |Value (%)| |
|
|-------------------|--------:| |
|
|**Average** | 29.00| |
|
|IFEval (0-Shot) | 53.66| |
|
|BBH (3-Shot) | 43.07| |
|
|MATH Lvl 5 (4-Shot)| 22.89| |
|
|GPQA (0-shot) | 9.96| |
|
|MuSR (0-shot) | 7.18| |
|
|MMLU-PRO (5-shot) | 37.26| |
|
|
|
|