Qwen2.5-7B-Anvita / README.md
sethuiyer's picture
Update README.md
dc6f8ca verified
|
raw
history blame
9.12 kB
---
base_model:
- happzy2633/qwen2.5-7b-ins-v3
- bunnycore/Qwen2.5-7B-Matrix
- bunnycore/Qwen2.5-7B-HyperMix
library_name: transformers
tags:
- mergekit
- merge
- reasoning
- qwen
license: apache-2.0
language:
- en
---
# **Qwen 2.5-7B Anvita**
![img](./logo.webp)
## Overview
**Anvita** is a state-of-the-art reasoning-oriented AI model designed to **connect ideas** and **understand complex inputs**. Derived from the Sanskrit word meaning "connected" or "understood," Anvita embodies intellectual depth and comprehension, making it an ideal choice for tasks requiring nuanced understanding and sophisticated reasoning.
Built using the **DARE TIES** merge method, Anvita integrates multiple pre-trained language models, including:
- **Qwen2.5-7B-HyperMix**
- **bunnycore/Qwen2.5-7B-Matrix**
- **happzy2633/qwen2.5-7b-ins-v3**
This combination optimizes Anvita for superior reasoning, dynamic conversations, and high-quality text generation.
## Features
- **Enhanced Reasoning:** Optimized for multi-step reasoning across various domains.
- **Long Sequence Handling:** Capable of processing extended inputs without loss of context.
- **Conversational Fluency:** Engages in fluid, context-aware dialogues.
- **Dense Knowledge Integration:** Combines knowledge from multiple base models for comprehensive understanding.
## Installation
To get started with Anvita, ensure you have the necessary dependencies installed. You can use the [Transformers](https://huggingface.co/docs/transformers/index) library for seamless integration.
```bash
pip install transformers rich
```
## Quick Start
Here's a simple example to demonstrate how to use Anvita for generating responses with enhanced reasoning capabilities.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from rich.console import Console
from rich.markdown import Markdown
# Initialize console
console = Console()
# Load the tokenizer and model from the specified path
MODEL_PATH = "sethuiyer/Qwen2.5-7B-Anvita"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH).to("cuda")
QUESTION = "Is 9.11 greater than 9.8?"
messages = [
{"role": "user", "content": QUESTION}
]
# Generate the answer using Entropic Chain of Thought decoding
answer, score = cot_decode_speculative(model, tokenizer, messages, k=2, max_new_tokens=2058)
# Format the answer as markdown
markdown_answer = f"""
# **Answer:**
{answer}
**Score:** {score}
"""
# Display the answer in markdown format
console.print(Markdown(markdown_answer))
```
**Example Output with k=2:**
```text
No, 9.11 is not greater than 9.8. To compare these two numbers, we can look at their decimal places. The number 9.8
can be thought of as 9.80, which makes it easier to compare directly with 9.11. Since 80 is greater than 11, it's
clear that 9.8 is greater than 9.11.
```
**Step-by-Step Reasoning with k=2:**
```text
Certainly! Let's break down the process step by step to determine how many 'K's are in the words "Kingdom" and
"Kith."
Step 1: Identify the word "Kingdom"
• The word "Kingdom" has the following letters: K, I, N, G, D, O, M.
• Count the number of 'K's in this word: There is only one 'K'.
Step 2: Identify the word "Kith"
• The word "Kith" has the following letters: K, I, T, H.
• Count the number of 'K's in this word: There is only one 'K'.
Step 3: Summarize the results
• In "Kingdom," there is 1 'K'.
• In "Kith," there is 1 'K'.
Final Answer:
• There is a total of 2 'K's in both words combined: 1 'K' in "Kingdom" and 1 'K' in "Kith."
So, the total number of 'K's in the words "Kingdom" and "Kith" is 2.
```
## Advanced Usage
For optimal reasoning performance, it is recommended to use **BF16** precision and the [Entropic Chain of Thought](https://huggingface.co/sethuiyer/Qwen2.5-7B-Anvita/blob/main/entropic_cot.py) decoding method. This experimental decoder combines entropy and CoT decoding to enhance output quality.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from rich.console import Console
from rich.markdown import Markdown
console = Console()
MODEL_PATH = "sethuiyer/Qwen2.5-7B-Anvita"
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH).to("cuda")
QUESTION = "How many 'K's are there in the words 'Kingdom' and 'Kith'?"
messages = [
{"role": "user", "content": QUESTION}
]
# Generate the answer with Entropic Chain of Thought decoding
answer, score = cot_decode_speculative(model, tokenizer, messages, k=2, max_new_tokens=2058)
# Display the formatted answer
markdown_answer = f"""
# **Answer:**
{answer}
**Score:** {score}
"""
console.print(Markdown(markdown_answer))
```
## Configuration
The following YAML configuration was used to produce Anvita:
```yaml
slices:
models:
- model: bunnycore/Qwen2.5-7B-Matrix
parameters:
weight: [0.25, 0.35, 0.45, 0.35, 0.25]
density: [0.1, 0.25, 0.5, 0.25, 0.1]
- model: bunnycore/Qwen2.5-7B-HyperMix
- model: happzy2633/qwen2.5-7b-ins-v3
parameters:
weight: [0.55, 0.45, 0.35, 0.45, 0.55]
density: [0.1, 0.25, 0.5, 0.25, 0.1]
merge_method: dare_ties
base_model: bunnycore/Qwen2.5-7B-HyperMix
parameters:
int8_mask: true
dtype: bfloat16
```
## Testimonial
### **Written by GPT-4o**
---
**Anvita** offers a unique blend of **logical rigor** and **creative flair**. She is **versatile**, tackling a broad spectrum of challenges across **mathematics, law, science, programming, and storytelling**. This model excels particularly well in creative writing and logical problem-solving, consistently producing **engaging narratives and structured reasoning chains**.
However, there are certain areas—such as **symbolic puzzles, detective mysteries, and edge case handling**—that present opportunities for **further improvement**. Through **targeted training and refinement**, Anvita can **unlock even greater potential**, becoming a **dominant force in natural language reasoning models**.
---
## Performance Evaluation
### **Key Strengths**
1. **Creative Writing**
- Generates **rich, immersive narratives** across multiple genres, especially excelling in **science fiction, dark fantasy, and character-driven stories**.
- Ability to **develop coherent plots and engaging dialogue** ensures that creative outputs meet high standards.
2. **Logical Reasoning and Problem Solving**
- Demonstrates strong **multi-step reasoning** across mathematical, legal, and scientific problems.
- Handles **complex logical structures** effectively, such as **graph theory, probability, and legal scenarios**.
3. **Conversational Fluency**
- Engages in **context-aware, fluid conversations** that mimic human interaction.
- Offers insightful takes on abstract topics, such as **existential questions** and **philosophy**.
4. **Programmatic Competency**
- Proficient in generating functional code, especially in **C++ and HolyC**, though minor adjustments are occasionally required.
- Tackles **algorithmic challenges** with competence, contributing solutions across **mathematics and programming logic**.
### **Areas for Improvement**
1. **Symbolic Reasoning and Puzzles**
- Struggles with **abstract symbolic puzzles**, requiring deeper understanding to identify patterns and relationships.
- Needs refinement in tackling **advanced combinatorics** and interpreting **subtle patterns**.
2. **Detective Mysteries**
- Competent in generating mystery scenarios but falls short in **crafting surprising twists**, especially the complex deductions associated with **locked-room scenarios**.
- Additional exposure to **Detective Conan-style reasoning frameworks** would significantly enhance performance.
3. **Handling Edge Cases**
- Occasionally misses **nuanced edge cases** in graph theory and statistical problems.
- Would benefit from more **granular handling** of boundary conditions and **edge-specific logic**.
---
## Overall Performance Summary
- **Overall Score:** 73/100
- **Tested Domains:** Creative Writing, Logical Reasoning, Symbolic Reasoning, Programming, Mathematics, Law, Scientific Problem-Solving.