Update README.md

dc6f8ca verified about 1 month ago

9.12 kB

	---

	base_model:
	- happzy2633/qwen2.5-7b-ins-v3
	- bunnycore/Qwen2.5-7B-Matrix
	- bunnycore/Qwen2.5-7B-HyperMix
	library_name: transformers
	tags:
	- mergekit
	- merge
	- reasoning
	- qwen
	license: apache-2.0
	language:
	- en

	---

	# Qwen 2.5-7B Anvita

	![img](./logo.webp)

	## Overview

	Anvita is a state-of-the-art reasoning-oriented AI model designed to connect ideas and understand complex inputs. Derived from the Sanskrit word meaning "connected" or "understood," Anvita embodies intellectual depth and comprehension, making it an ideal choice for tasks requiring nuanced understanding and sophisticated reasoning.

	Built using the DARE TIES merge method, Anvita integrates multiple pre-trained language models, including:

	- Qwen2.5-7B-HyperMix
	- bunnycore/Qwen2.5-7B-Matrix
	- happzy2633/qwen2.5-7b-ins-v3

	This combination optimizes Anvita for superior reasoning, dynamic conversations, and high-quality text generation.

	## Features

	- Enhanced Reasoning: Optimized for multi-step reasoning across various domains.
	- Long Sequence Handling: Capable of processing extended inputs without loss of context.
	- Conversational Fluency: Engages in fluid, context-aware dialogues.
	- Dense Knowledge Integration: Combines knowledge from multiple base models for comprehensive understanding.

	## Installation

	To get started with Anvita, ensure you have the necessary dependencies installed. You can use the [Transformers](https://huggingface.co/docs/transformers/index) library for seamless integration.

	```bash
	pip install transformers rich
	```

	## Quick Start

	Here's a simple example to demonstrate how to use Anvita for generating responses with enhanced reasoning capabilities.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from rich.console import Console
	from rich.markdown import Markdown

	# Initialize console
	console = Console()

	# Load the tokenizer and model from the specified path
	MODEL_PATH = "sethuiyer/Qwen2.5-7B-Anvita"

	tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
	model = AutoModelForCausalLM.from_pretrained(MODEL_PATH).to("cuda")

	QUESTION = "Is 9.11 greater than 9.8?"

	messages = [
	{"role": "user", "content": QUESTION}
	]

	# Generate the answer using Entropic Chain of Thought decoding
	answer, score = cot_decode_speculative(model, tokenizer, messages, k=2, max_new_tokens=2058)

	# Format the answer as markdown
	markdown_answer = f"""
	# Answer:
	{answer}

	Score: {score}
	"""

	# Display the answer in markdown format
	console.print(Markdown(markdown_answer))
	```

	Example Output with k=2:

	```text
	No, 9.11 is not greater than 9.8. To compare these two numbers, we can look at their decimal places. The number 9.8
	can be thought of as 9.80, which makes it easier to compare directly with 9.11. Since 80 is greater than 11, it's
	clear that 9.8 is greater than 9.11.
	```

	Step-by-Step Reasoning with k=2:

	```text
	Certainly! Let's break down the process step by step to determine how many 'K's are in the words "Kingdom" and
	"Kith."

	Step 1: Identify the word "Kingdom"

	• The word "Kingdom" has the following letters: K, I, N, G, D, O, M.
	• Count the number of 'K's in this word: There is only one 'K'.

	Step 2: Identify the word "Kith"

	• The word "Kith" has the following letters: K, I, T, H.
	• Count the number of 'K's in this word: There is only one 'K'.

	Step 3: Summarize the results

	• In "Kingdom," there is 1 'K'.
	• In "Kith," there is 1 'K'.

	Final Answer:

	• There is a total of 2 'K's in both words combined: 1 'K' in "Kingdom" and 1 'K' in "Kith."

	So, the total number of 'K's in the words "Kingdom" and "Kith" is 2.
	```

	## Advanced Usage

	For optimal reasoning performance, it is recommended to use BF16 precision and the [Entropic Chain of Thought](https://huggingface.co/sethuiyer/Qwen2.5-7B-Anvita/blob/main/entropic_cot.py) decoding method. This experimental decoder combines entropy and CoT decoding to enhance output quality.

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from rich.console import Console
	from rich.markdown import Markdown

	console = Console()
	MODEL_PATH = "sethuiyer/Qwen2.5-7B-Anvita"

	tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
	model = AutoModelForCausalLM.from_pretrained(MODEL_PATH).to("cuda")

	QUESTION = "How many 'K's are there in the words 'Kingdom' and 'Kith'?"
	messages = [
	{"role": "user", "content": QUESTION}
	]

	# Generate the answer with Entropic Chain of Thought decoding
	answer, score = cot_decode_speculative(model, tokenizer, messages, k=2, max_new_tokens=2058)

	# Display the formatted answer
	markdown_answer = f"""
	# Answer:
	{answer}

	Score: {score}
	"""

	console.print(Markdown(markdown_answer))
	```

	## Configuration

	The following YAML configuration was used to produce Anvita:

	```yaml
	slices:
	models:
	- model: bunnycore/Qwen2.5-7B-Matrix
	parameters:
	weight: [0.25, 0.35, 0.45, 0.35, 0.25]
	density: [0.1, 0.25, 0.5, 0.25, 0.1]
	- model: bunnycore/Qwen2.5-7B-HyperMix
	- model: happzy2633/qwen2.5-7b-ins-v3
	parameters:
	weight: [0.55, 0.45, 0.35, 0.45, 0.55]
	density: [0.1, 0.25, 0.5, 0.25, 0.1]
	merge_method: dare_ties
	base_model: bunnycore/Qwen2.5-7B-HyperMix
	parameters:
	int8_mask: true
	dtype: bfloat16
	```

	## Testimonial

	### Written by GPT-4o

	---

	Anvita offers a unique blend of logical rigor and creative flair. She is versatile, tackling a broad spectrum of challenges across mathematics, law, science, programming, and storytelling. This model excels particularly well in creative writing and logical problem-solving, consistently producing engaging narratives and structured reasoning chains.

	However, there are certain areas—such as symbolic puzzles, detective mysteries, and edge case handling—that present opportunities for further improvement. Through targeted training and refinement, Anvita can unlock even greater potential, becoming a dominant force in natural language reasoning models.

	---

	## Performance Evaluation

	### Key Strengths

	1. Creative Writing
	- Generates rich, immersive narratives across multiple genres, especially excelling in science fiction, dark fantasy, and character-driven stories.
	- Ability to develop coherent plots and engaging dialogue ensures that creative outputs meet high standards.

	2. Logical Reasoning and Problem Solving
	- Demonstrates strong multi-step reasoning across mathematical, legal, and scientific problems.
	- Handles complex logical structures effectively, such as graph theory, probability, and legal scenarios.

	3. Conversational Fluency
	- Engages in context-aware, fluid conversations that mimic human interaction.
	- Offers insightful takes on abstract topics, such as existential questions and philosophy.

	4. Programmatic Competency
	- Proficient in generating functional code, especially in C++ and HolyC, though minor adjustments are occasionally required.
	- Tackles algorithmic challenges with competence, contributing solutions across mathematics and programming logic.

	### Areas for Improvement

	1. Symbolic Reasoning and Puzzles
	- Struggles with abstract symbolic puzzles, requiring deeper understanding to identify patterns and relationships.
	- Needs refinement in tackling advanced combinatorics and interpreting subtle patterns.

	2. Detective Mysteries
	- Competent in generating mystery scenarios but falls short in crafting surprising twists, especially the complex deductions associated with locked-room scenarios.
	- Additional exposure to Detective Conan-style reasoning frameworks would significantly enhance performance.

	3. Handling Edge Cases
	- Occasionally misses nuanced edge cases in graph theory and statistical problems.
	- Would benefit from more granular handling of boundary conditions and edge-specific logic.

	---

	## Overall Performance Summary

	- Overall Score: 73/100
	- Tested Domains: Creative Writing, Logical Reasoning, Symbolic Reasoning, Programming, Mathematics, Law, Scientific Problem-Solving.