--- base_model: Daemontatox/RA_Reasoner license: apache-2.0 datasets: - Daemontatox/Deepthinking-COT language: - en new_version: Daemontatox/RA_Reasoner2.0 library_name: transformers tags: - COT - Reasoning - text-generation-inference pipeline_tag: text-generation model-index: - name: RA_Reasoner2.0 results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: wis-k/instruction-following-eval split: train args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 53.66 name: averaged accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: SaylorTwift/bbh split: test args: num_few_shot: 3 metrics: - type: acc_norm value: 43.07 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: lighteval/MATH-Hard split: test args: num_few_shot: 4 metrics: - type: exact_match value: 22.89 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa split: train args: num_few_shot: 0 metrics: - type: acc_norm value: 9.96 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 7.18 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 37.26 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0 name: Open LLM Leaderboard --- ![RA_REASONER](./image.webp) # **RA_Reasoner 2.0** ## **Model Details** **Developed by:** [Daemontatox](#) **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) **Base Model:** Daemontatox/RA_Reasoner This model is fine-tuned from the Falcon-10B-Instruct model, leveraging advanced training optimizations to enhance reasoning and instruction-following capabilities. It was trained 2x faster using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library. --- ## **Training Details** - **Frameworks Used:** Unsloth, Hugging Face TRL - **Fine-Tuning Focus:** Emphasis on reasoning, logic-based tasks, and instruction comprehension. - **Dataset:** Includes examples from [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT). - **Optimization:** Significant speedup during fine-tuning while maintaining model quality. Further details on hyperparameters and fine-tuning methodology will be added in future updates. --- ## **Intended Use** This model is intended for **research and development** in text generation, reasoning tasks, and instruction-following applications. ### **Key Features:** - Enhanced reasoning capabilities for multi-step logical problems. - Robust instruction-following for complex tasks. - Fine-tuned for Chain-of-Thought (COT) reasoning and inference. ### **Applications:** - Research on reasoning-based AI systems. - Tasks requiring logical deductions, such as question answering and problem-solving. - General text generation with a focus on nuanced understanding. --- ## **Limitations and Warnings** - This model is not designed for real-time or production-critical tasks. - Outputs may vary based on input specificity and complexity. - Users are responsible for ensuring ethical use and compliance with applicable regulations. --- ## **Acknowledgments** - Base model: Daemontatox/RA_Reasoner - Training acceleration powered by [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library. - Dataset contributions: [Daemontatox/Deepthinking-COT](https://huggingface.co/datasets/Daemontatox/Deepthinking-COT). ---# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__RA_Reasoner2.0-details)! Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FRA_Reasoner2.0&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)! | Metric |Value (%)| |-------------------|--------:| |**Average** | 29.00| |IFEval (0-Shot) | 53.66| |BBH (3-Shot) | 43.07| |MATH Lvl 5 (4-Shot)| 22.89| |GPQA (0-shot) | 9.96| |MuSR (0-shot) | 7.18| |MMLU-PRO (5-shot) | 37.26|