File size: 5,911 Bytes
ca46373
 
04ed5af
 
 
 
 
 
 
 
 
 
 
dbaeee8
a40cf3e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca46373
6e31b38
d41acf4
ca46373
6e31b38
 
 
 
 
 
d49824a
6e31b38
 
 
 
 
 
 
 
 
 
 
ca46373
6e31b38
ca46373
6e31b38
 
 
 
 
 
 
 
 
 
ca46373
6e31b38
 
 
 
d41acf4
6e31b38
 
 
d41acf4
6e31b38
 
 
d41acf4
6e31b38
d41acf4
6e31b38
d41acf4
ae24756
6e31b38
 
d41acf4
a40cf3e
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
base_model: Daemontatox/RA_Reasoner
license: apache-2.0
datasets:
- Daemontatox/Deepthinking-COT
language:
- en
new_version: Daemontatox/RA_Reasoner2.0
library_name: transformers
tags:
- COT
- Reasoning
- text-generation-inference
pipeline_tag: text-generation
model-index:
- name: RA_Reasoner2.0
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: wis-k/instruction-following-eval
      split: train
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 53.66
      name: averaged accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: SaylorTwift/bbh
      split: test
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 43.07
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: lighteval/MATH-Hard
      split: test
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 22.89
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      split: train
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 9.96
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 7.18
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 37.26
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
      name: Open LLM Leaderboard
---

![RA_REASONER](./image.webp)

# **RA_Reasoner 2.0**

## **Model Details**

**Developed by:** [Daemontatox](#)  
**License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)  
**Base Model:** Daemontatox/RA_Reasoner

This model is fine-tuned from the Falcon-10B-Instruct model, leveraging advanced training optimizations to enhance reasoning and instruction-following capabilities. It was trained 2x faster using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.  

---

## **Training Details**

- **Frameworks Used:** Unsloth, Hugging Face TRL  
- **Fine-Tuning Focus:** Emphasis on reasoning, logic-based tasks, and instruction comprehension.  
- **Dataset:** Includes examples from [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT).  
- **Optimization:** Significant speedup during fine-tuning while maintaining model quality.  

Further details on hyperparameters and fine-tuning methodology will be added in future updates.

---

## **Intended Use**

This model is intended for **research and development** in text generation, reasoning tasks, and instruction-following applications.  

### **Key Features:**
- Enhanced reasoning capabilities for multi-step logical problems.
- Robust instruction-following for complex tasks.
- Fine-tuned for Chain-of-Thought (COT) reasoning and inference.  

### **Applications:**
- Research on reasoning-based AI systems.  
- Tasks requiring logical deductions, such as question answering and problem-solving.  
- General text generation with a focus on nuanced understanding.

---

## **Limitations and Warnings**

- This model is not designed for real-time or production-critical tasks.  
- Outputs may vary based on input specificity and complexity.  
- Users are responsible for ensuring ethical use and compliance with applicable regulations.  

---

## **Acknowledgments**

- Base model: Daemontatox/RA_Reasoner
- Training acceleration powered by [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.  
- Dataset contributions: [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT).  

---# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__RA_Reasoner2.0-details)!
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FRA_Reasoner2.0&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!

|      Metric       |Value (%)|
|-------------------|--------:|
|**Average**        |    29.00|
|IFEval (0-Shot)    |    53.66|
|BBH (3-Shot)       |    43.07|
|MATH Lvl 5 (4-Shot)|    22.89|
|GPQA (0-shot)      |     9.96|
|MuSR (0-shot)      |     7.18|
|MMLU-PRO (5-shot)  |    37.26|