---
base_model: Daemontatox/RA_Reasoner
license: apache-2.0
datasets:
- Daemontatox/Deepthinking-COT
language:
- en
new_version: Daemontatox/RA_Reasoner2.0
library_name: transformers
tags:
- COT
- Reasoning
- text-generation-inference
pipeline_tag: text-generation
model-index:
- name: RA_Reasoner2.0
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: wis-k/instruction-following-eval
      split: train
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 53.66
      name: averaged accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: SaylorTwift/bbh
      split: test
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 43.07
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: lighteval/MATH-Hard
      split: test
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 22.89
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      split: train
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 9.96
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 7.18
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 37.26
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
---

![RA_REASONER](./image.webp)

# **RA_Reasoner 2.0**

## **Model Details**

**Developed by:** [Daemontatox](#)  
**License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  
**Base Model:** Daemontatox/RA_Reasoner

This model is fine-tuned from the Falcon-10B-Instruct model, leveraging advanced training optimizations to enhance reasoning and instruction-following capabilities. It was trained 2x faster using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.  

---

## **Training Details**

- **Frameworks Used:** Unsloth, Hugging Face TRL  
- **Fine-Tuning Focus:** Emphasis on reasoning, logic-based tasks, and instruction comprehension.  
- **Dataset:** Includes examples from [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT).  
- **Optimization:** Significant speedup during fine-tuning while maintaining model quality.  

Further details on hyperparameters and fine-tuning methodology will be added in future updates.

---

## **Intended Use**

This model is intended for **research and development** in text generation, reasoning tasks, and instruction-following applications.  

### **Key Features:**
- Enhanced reasoning capabilities for multi-step logical problems.
- Robust instruction-following for complex tasks.
- Fine-tuned for Chain-of-Thought (COT) reasoning and inference.  

### **Applications:**
- Research on reasoning-based AI systems.  
- Tasks requiring logical deductions, such as question answering and problem-solving.  
- General text generation with a focus on nuanced understanding.

---

## **Limitations and Warnings**

- This model is not designed for real-time or production-critical tasks.  
- Outputs may vary based on input specificity and complexity.  
- Users are responsible for ensuring ethical use and compliance with applicable regulations.  

---

## **Acknowledgments**

- Base model: Daemontatox/RA_Reasoner
- Training acceleration powered by [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.  
- Dataset contributions: [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT).  

---# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__RA_Reasoner2.0-details)!
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FRA_Reasoner2.0&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!

|      Metric       |Value (%)|
|-------------------|--------:|
|**Average**        |    29.00|
|IFEval (0-Shot)    |    53.66|
|BBH (3-Shot)       |    43.07|
|MATH Lvl 5 (4-Shot)|    22.89|
|GPQA (0-shot)      |     9.96|
|MuSR (0-shot)      |     7.18|
|MMLU-PRO (5-shot)  |    37.26|