metadata

base_model: Daemontatox/RA_Reasoner
license: apache-2.0
datasets:
  - Daemontatox/Deepthinking-COT
language:
  - en
new_version: Daemontatox/RA_Reasoner2.0
library_name: transformers
tags:
  - COT
  - Reasoning
  - text-generation-inference
pipeline_tag: text-generation
model-index:
  - name: RA_Reasoner2.0
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: wis-k/instruction-following-eval
          split: train
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 53.66
            name: averaged accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: SaylorTwift/bbh
          split: test
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 43.07
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: lighteval/MATH-Hard
          split: test
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 22.89
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          split: train
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 9.96
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 7.18
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 37.26
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FRA_Reasoner2.0
          name: Open LLM Leaderboard

RA_Reasoner 2.0

Model Details

Developed by: Daemontatox
License: Apache 2.0
Base Model: Daemontatox/RA_Reasoner

This model is fine-tuned from the Falcon-10B-Instruct model, leveraging advanced training optimizations to enhance reasoning and instruction-following capabilities. It was trained 2x faster using Unsloth and Hugging Face's TRL library.

Training Details

Frameworks Used: Unsloth, Hugging Face TRL
Fine-Tuning Focus: Emphasis on reasoning, logic-based tasks, and instruction comprehension.
Dataset: Includes examples from Daemontatox/Deepthinking-COT.
Optimization: Significant speedup during fine-tuning while maintaining model quality.

Further details on hyperparameters and fine-tuning methodology will be added in future updates.

Intended Use

This model is intended for research and development in text generation, reasoning tasks, and instruction-following applications.

Key Features:

Enhanced reasoning capabilities for multi-step logical problems.
Robust instruction-following for complex tasks.
Fine-tuned for Chain-of-Thought (COT) reasoning and inference.

Applications:

Research on reasoning-based AI systems.
Tasks requiring logical deductions, such as question answering and problem-solving.
General text generation with a focus on nuanced understanding.

Limitations and Warnings

This model is not designed for real-time or production-critical tasks.
Outputs may vary based on input specificity and complexity.
Users are responsible for ensuring ethical use and compliance with applicable regulations.

Acknowledgments

Base model: Daemontatox/RA_Reasoner
Training acceleration powered by Unsloth and Hugging Face's TRL library.
Dataset contributions: Daemontatox/Deepthinking-COT.

---# Open LLM Leaderboard Evaluation Results Detailed results can be found here! Summarized results can be found here!

Metric	Value (%)
Average	29.00
IFEval (0-Shot)	53.66
BBH (3-Shot)	43.07
MATH Lvl 5 (4-Shot)	22.89
GPQA (0-shot)	9.96
MuSR (0-shot)	7.18
MMLU-PRO (5-shot)	37.26