|
--- |
|
tags: |
|
- long-cot-reasoning |
|
- transformers |
|
- mamba2 |
|
- llms |
|
- chain-of-thought |
|
license: apache-2.0 |
|
language: |
|
- en |
|
datasets: |
|
- Daemontatox/LongCOT-Reason |
|
- Daemontatox/alpaca_reasoning_COT |
|
base_model: |
|
- Qwen/Qwen2.5-14B-Instruct |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
model-index: |
|
- name: Sphinx2.0 |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: wis-k/instruction-following-eval |
|
split: train |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 71.23 |
|
name: averaged accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FSphinx2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: SaylorTwift/bbh |
|
split: test |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 49.4 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FSphinx2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: lighteval/MATH-Hard |
|
split: test |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 2.72 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FSphinx2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
split: train |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 5.82 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FSphinx2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 13.05 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FSphinx2.0 |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 46.49 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=Daemontatox%2FSphinx2.0 |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
![Sphinx of Reasoning](./image.webp) |
|
|
|
# **Sphinx: The Apex of Logical Deduction and Chain-of-Thought Reasoning** |
|
|
|
- **Developed by:** Daemontatox |
|
- **License:** Apache-2.0 |
|
- **Base Model:** Fine-tuned from `unsloth/qwen2.5-14b-instruct-bnb-4bit` |
|
- **Accelerated by:** [Unsloth Framework](https://github.com/unslothai/unsloth) |
|
- **TRL-Optimized:** Integrated with Huggingface's TRL library for enhanced performance in logical reasoning. |
|
|
|
## **Unveiling Sphinx: Master of Reasoned Thought** |
|
|
|
Sphinx is a cutting-edge Long Chain-of-Thought (CoT) reasoning model meticulously crafted to unravel complex challenges requiring rigorous logical analysis. Built upon the robust foundation of the Qwen2.5 architecture, Sphinx excels at constructing coherent, step-by-step thought processes, providing unparalleled insight into its reasoning and ensuring clarity in its conclusions. |
|
|
|
> _"Where complexity yields to logical clarity."_ |
|
|
|
### **Core Strengths: Reasoning, Logic, and CoT** |
|
|
|
- **Unrivaled Chain-of-Thought (CoT) Mastery:** Engineered for dissecting intricate problems, Sphinx meticulously constructs each step of its reasoning, offering a transparent and verifiable pathway to the solution. |
|
- **Deep Logical Reasoning Capabilities:** Sphinx is adept at navigating complex logical structures, drawing valid inferences and forming sound conclusions through multi-layered analysis. |
|
- **Exceptional Reasoning Fidelity:** Fine-tuned to maintain the highest standards of logical consistency, Sphinx delivers outputs that are not only correct but also demonstrably well-reasoned. |
|
- **Efficient Long-Context Reasoning:** Leveraging the power of Unsloth, Sphinx processes extensive information efficiently, maintaining logical coherence across extended reasoning chains. |
|
- **Explainable AI through Transparent Logic:** Sphinx's inherent CoT approach provides explicit and understandable reasoning, making its decision-making process transparent and trustworthy. |
|
|
|
## **Model Architecture and Fine-tuning for Logical Prowess** |
|
|
|
### **Architectural Foundation** |
|
|
|
- **Base Model:** Qwen2.5-14B - Renowned for its strong general language understanding, forming a solid basis for specialized reasoning. |
|
- **Parameters:** 14 billion - Providing the capacity to model intricate reasoning patterns. |
|
- **Quantization:** 4-bit precision using BitsAndBytes (bnb) - Optimizing for accessibility without sacrificing logical reasoning accuracy. |
|
- **Extended Reasoning Window:** Supports inputs up to 16k tokens, crucial for accommodating the detailed context required for complex logical deductions. |
|
|
|
### **Training Methodology: Honing Logical Acumen** |
|
|
|
- **Frameworks:** Huggingface Transformers + TRL + Unsloth - A powerful combination for efficient training and reinforcement learning. |
|
- **Data Sources:** A meticulously curated collection of datasets specifically designed to challenge and refine logical reasoning skills, encompassing academic, legal, and formal logic domains. |
|
- **Optimization Strategies:** |
|
- **LoRA (Low-Rank Adaptation):** Enabling parameter-efficient fine-tuning, focusing on adapting the model for superior logical inference. |
|
- **Reinforcement Learning from Human Feedback (RLHF):** Guiding the model towards generating more logically sound and human-aligned reasoning steps. |
|
|
|
## **Sphinx's Reasoning Toolkit: Capabilities in Action** |
|
|
|
1. **Masterful Long-CoT Generation:** Deconstructs and conquers multi-layered problems by constructing detailed, logically interconnected reasoning sequences. |
|
2. **Explanatory Power through Logic:** Provides clear, step-by-step logical derivations for its outputs, enhancing trust and understanding. |
|
3. **Adaptable Logical Framework:** Easily tailored to specialized reasoning tasks through targeted fine-tuning, enabling application in diverse logical domains. |
|
|
|
## **Unlocking Potential: Applications Driven by Logic** |
|
|
|
- **Advanced Academic Research:** Generating in-depth, logically structured analyses for complex scientific and philosophical inquiries. |
|
- **Robust Legal Reasoning Assistance:** Constructing and articulating multi-step legal arguments with precision and logical rigor. |
|
- **Transformative STEM Education:** Guiding learners through intricate mathematical and logical problems with clear, step-by-step explanations. |
|
- **Transparent Cognitive AI Systems:** Powering AI systems where explainability and logical justification are paramount for decision-making.# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__Sphinx2.0-details)! |
|
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FSphinx2.0&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)! |
|
|
|
| Metric |Value (%)| |
|
|-------------------|--------:| |
|
|**Average** | 31.45| |
|
|IFEval (0-Shot) | 71.23| |
|
|BBH (3-Shot) | 49.40| |
|
|MATH Lvl 5 (4-Shot)| 2.72| |
|
|GPQA (0-shot) | 5.82| |
|
|MuSR (0-shot) | 13.05| |
|
|MMLU-PRO (5-shot) | 46.49| |
|
|
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/Daemontatox__Sphinx2.0-details)! |
|
Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=Daemontatox%2FSphinx2.0&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)! |
|
|
|
| Metric |Value (%)| |
|
|-------------------|--------:| |
|
|**Average** | 31.45| |
|
|IFEval (0-Shot) | 71.23| |
|
|BBH (3-Shot) | 49.40| |
|
|MATH Lvl 5 (4-Shot)| 2.72| |
|
|GPQA (0-shot) | 5.82| |
|
|MuSR (0-shot) | 13.05| |
|
|MMLU-PRO (5-shot) | 46.49| |
|
|
|
|