sethuiyer commited on
Commit
dc6f8ca
1 Parent(s): f4281f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -100
README.md CHANGED
@@ -1,34 +1,62 @@
1
- ---
2
- base_model:
3
- - happzy2633/qwen2.5-7b-ins-v3
4
- - bunnycore/Qwen2.5-7B-Matrix
5
- - bunnycore/Qwen2.5-7B-HyperMix
6
- library_name: transformers
7
- tags:
8
- - mergekit
9
- - merge
10
- - reasoning
11
- - qwen
12
- license: apache-2.0
13
- language:
14
- - en
15
- ---
16
- ## Qwen 2.5-7B-Anvita
17
- Anvita Model is a reasoning-oriented AI model based on a Sanskrit word meaning "connected" or "understood." "Anvita" reflects the model's purpose to "connect ideas" and "understand" complex inputs, symbolizing intellectual depth and comprehension.
18
-
19
- Built using the DARE TIES merge method, it combines pre-trained language models such as Qwen2.5-7B-HyperMix and others, optimized for reasoning, conversation, and text generation.
20
-
21
- The model configuration emphasizes long sequence lengths, conversation datasets, and dense reasoning abilities.
22
-
23
- ## Note:
24
- If you want good reasoning power from this model, please use BF16 and [Entropic Chain of Thought](https://huggingface.co/sethuiyer/Qwen2.5-7B-Anvita/blob/main/entropic_cot.py) decoding, an experimental decoder mixing
25
- entropix and CoT decoding.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  ```python
28
  from transformers import AutoTokenizer, AutoModelForCausalLM
29
- # Initialize console
30
  from rich.console import Console
31
  from rich.markdown import Markdown
 
 
32
  console = Console()
33
 
34
  # Load the tokenizer and model from the specified path
@@ -37,15 +65,13 @@ MODEL_PATH = "sethuiyer/Qwen2.5-7B-Anvita"
37
  tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
38
  model = AutoModelForCausalLM.from_pretrained(MODEL_PATH).to("cuda")
39
 
40
- QUESTION = '''
41
- is 9.11 greater than 9.8?
42
- '''
43
  messages = [
44
  {"role": "user", "content": QUESTION}
45
  ]
46
 
47
-
48
- # Generate the answer
49
  answer, score = cot_decode_speculative(model, tokenizer, messages, k=2, max_new_tokens=2058)
50
 
51
  # Format the answer as markdown
@@ -56,28 +82,20 @@ markdown_answer = f"""
56
  **Score:** {score}
57
  """
58
 
59
- # Use rich to display the answer in markdown format
60
  console.print(Markdown(markdown_answer))
61
  ```
62
 
63
- with k=2:
 
64
  ```text
65
  No, 9.11 is not greater than 9.8. To compare these two numbers, we can look at their decimal places. The number 9.8
66
  can be thought of as 9.80, which makes it easier to compare directly with 9.11. Since 80 is greater than 11, it's
67
- clear that 9.8 is greater than 9.11.
68
  ```
69
 
70
- Question: How many K are there in the word Kingdom & Kith?
71
- k=1:
72
- ```text
73
- In the word "Kingdom," there are two 'K's: K-I-N-G-O-M-D.
74
 
75
- In the word "Kith," there is one 'K': K-I-T-H.
76
-
77
- So, there are a total of three 'K's in both words combined: two in "Kingdom" and one in "Kith."
78
- ```
79
-
80
- with k=2 and let's think step by step in prompt:
81
  ```text
82
  Certainly! Let's break down the process step by step to determine how many 'K's are in the words "Kingdom" and
83
  "Kith."
@@ -85,9 +103,9 @@ Certainly! Let's break down the process step by step to determine how many 'K's
85
  Step 1: Identify the word "Kingdom"
86
 
87
  • The word "Kingdom" has the following letters: K, I, N, G, D, O, M.
88
- • Count the number of 'K's in this word: There is only one 'K'.
89
 
90
- Step 2: Identify the word "Kith"
91
 
92
  • The word "Kith" has the following letters: K, I, T, H.
93
  • Count the number of 'K's in this word: There is only one 'K'.
@@ -101,92 +119,116 @@ Final Answer:
101
 
102
  • There is a total of 2 'K's in both words combined: 1 'K' in "Kingdom" and 1 'K' in "Kith."
103
 
104
- So, the total number of 'K's in the words "Kingdom" and "Kith" is 2.
105
  ```
106
 
107
- ### Configuration
108
 
109
- The following YAML configuration was used to produce this model:
110
 
111
- ```yaml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
 
 
 
 
 
 
 
 
 
 
 
113
  slices:
114
- models:
115
- - model: bunnycore/Qwen2.5-7B-Matrix
116
- parameters:
117
- weight: [0.25, 0.35, 0.45, 0.35, 0.25]
118
- density: [0.1, 0.25, 0.5, 0.25, 0.1]
119
- - model: bunnycore/Qwen2.5-7B-HyperMix
120
- - model: happzy2633/qwen2.5-7b-ins-v3
121
- parameters:
122
- weight: [0.55, 0.45, 0.35, 0.45, 0.55]
123
- density: [0.1, 0.25, 0.5, 0.25, 0.1]
124
  merge_method: dare_ties
125
  base_model: bunnycore/Qwen2.5-7B-HyperMix
126
  parameters:
127
  int8_mask: true
128
  dtype: bfloat16
129
-
130
  ```
131
- ____
132
 
 
133
 
134
- ### **Testimonial for Anvita (Qwen 2.5-7B-Anvita)**
135
- **Written by GPT-4o**
136
 
137
  ---
138
 
139
- Anvita offers a unique blend of **logical rigor** and **creative flair.** She is **versatile**, tackling a broad spectrum of challenges across **mathematics, law, science, programming, and storytelling**. This model excels particularly well in creative writing and logical problem-solving, consistently producing **engaging narratives and structured reasoning chains**.
140
 
141
  However, there are certain areas—such as **symbolic puzzles, detective mysteries, and edge case handling**—that present opportunities for **further improvement**. Through **targeted training and refinement**, Anvita can **unlock even greater potential**, becoming a **dominant force in natural language reasoning models**.
142
 
143
  ---
144
 
145
- ### **Performance Evaluation**
146
 
147
- #### **Key Strengths:**
148
- 1. **Creative Writing:**
149
- - Anvita generates **rich, immersive narratives** across multiple genres, especially excelling in **science fiction, dark fantasy, and character-driven stories**.
150
- - Her ability to **develop coherent plots and engaging dialogue** ensures that creative outputs meet high standards.
151
-
152
- 2. **Logical Reasoning and Problem Solving:**
153
- - Demonstrates strong **multi-step reasoning** across mathematical, legal, and scientific problems.
154
- - Handles **complex logical structures** effectively, such as **graph theory, probability, and legal scenarios**.
155
 
156
- 3. **Conversational Fluency:**
157
- - Engages in **context-aware, fluid conversations** that mimic human interaction.
158
- - Offers insightful takes on abstract topics, such as **existential questions** and **philosophy**.
159
-
160
- 4. **Programmatic Competency:**
161
- - Shows proficiency in generating functional code, especially in **C++ and HolyC**, though minor adjustments are occasionally required.
162
- - Tackles **algorithmic challenges** with competence, contributing solutions across **mathematics and programming logic**.
163
 
164
- ---
 
 
165
 
166
- #### **Areas for Improvement:**
167
- 1. **Symbolic Reasoning and Puzzles:**
168
- - Struggles with **abstract symbolic puzzles**, requiring deeper understanding to identify patterns and relationships.
169
- - Needs refinement in tackling **advanced combinatorics** and interpreting **subtle patterns.**
170
 
171
- 2. **Detective Mysteries:**
172
- - Though competent in generating mystery scenarios, she falls short in **crafting surprising twists**—especially the complex deductions associated with **locked-room scenarios.**
173
- - Additional exposure to **Detective Conan-style reasoning frameworks** would significantly enhance her performance.
174
 
175
- 3. **Handling Edge Cases:**
176
- - Occasionally misses **nuanced edge cases** in graph theory and statistical problems.
177
- - Would benefit from more **granular handling** of boundary conditions and **edge-specific logic.**
178
 
179
- ---
 
 
180
 
181
- ### **Overall Performance Summary**
182
- **Overall Score:** **73/100**
183
- **Tested Domains:** Creative Writing, Logical Reasoning, Symbolic Reasoning, Programming, Mathematics, Law, Scientific Problem-Solving.
184
 
185
- Anvita has shown **remarkable versatility** and **adaptability**, managing to excel across a **diverse range of domains.** Her **logical reasoning** and **creative flair** make her stand out as a well-rounded model, capable of **handling both structured problem-solving** and **free-form text generation.**
 
 
186
 
187
  ---
188
 
189
- ### **Recommendation:**
190
- Anvita is **highly recommended** for **applications requiring deep reasoning, problem-solving, creative content generation, and conversational interactions.** Her **7B parameter architecture** ensures she offers an **efficient yet powerful performance**, making her accessible for developers and researchers alike.
 
 
191
 
192
- With **further refinement**, Anvita can become an even more formidable model—capable of **handling nuanced tasks with precision** and **pushing the boundaries of what AI-generated reasoning can achieve.** For those seeking a **high-quality conversational model** without the computational burden of larger architectures, **Anvita** offers the **perfect blend of efficiency and depth.**
 
1
+ ---
2
+
3
+ base_model:
4
+ - happzy2633/qwen2.5-7b-ins-v3
5
+ - bunnycore/Qwen2.5-7B-Matrix
6
+ - bunnycore/Qwen2.5-7B-HyperMix
7
+ library_name: transformers
8
+ tags:
9
+ - mergekit
10
+ - merge
11
+ - reasoning
12
+ - qwen
13
+ license: apache-2.0
14
+ language:
15
+ - en
16
+
17
+ ---
18
+
19
+ # **Qwen 2.5-7B Anvita**
20
+
21
+ ![img](./logo.webp)
22
+
23
+ ## Overview
24
+
25
+ **Anvita** is a state-of-the-art reasoning-oriented AI model designed to **connect ideas** and **understand complex inputs**. Derived from the Sanskrit word meaning "connected" or "understood," Anvita embodies intellectual depth and comprehension, making it an ideal choice for tasks requiring nuanced understanding and sophisticated reasoning.
26
+
27
+ Built using the **DARE TIES** merge method, Anvita integrates multiple pre-trained language models, including:
28
+
29
+ - **Qwen2.5-7B-HyperMix**
30
+ - **bunnycore/Qwen2.5-7B-Matrix**
31
+ - **happzy2633/qwen2.5-7b-ins-v3**
32
+
33
+ This combination optimizes Anvita for superior reasoning, dynamic conversations, and high-quality text generation.
34
+
35
+ ## Features
36
+
37
+ - **Enhanced Reasoning:** Optimized for multi-step reasoning across various domains.
38
+ - **Long Sequence Handling:** Capable of processing extended inputs without loss of context.
39
+ - **Conversational Fluency:** Engages in fluid, context-aware dialogues.
40
+ - **Dense Knowledge Integration:** Combines knowledge from multiple base models for comprehensive understanding.
41
+
42
+ ## Installation
43
+
44
+ To get started with Anvita, ensure you have the necessary dependencies installed. You can use the [Transformers](https://huggingface.co/docs/transformers/index) library for seamless integration.
45
+
46
+ ```bash
47
+ pip install transformers rich
48
+ ```
49
+
50
+ ## Quick Start
51
+
52
+ Here's a simple example to demonstrate how to use Anvita for generating responses with enhanced reasoning capabilities.
53
 
54
  ```python
55
  from transformers import AutoTokenizer, AutoModelForCausalLM
 
56
  from rich.console import Console
57
  from rich.markdown import Markdown
58
+
59
+ # Initialize console
60
  console = Console()
61
 
62
  # Load the tokenizer and model from the specified path
 
65
  tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
66
  model = AutoModelForCausalLM.from_pretrained(MODEL_PATH).to("cuda")
67
 
68
+ QUESTION = "Is 9.11 greater than 9.8?"
69
+
 
70
  messages = [
71
  {"role": "user", "content": QUESTION}
72
  ]
73
 
74
+ # Generate the answer using Entropic Chain of Thought decoding
 
75
  answer, score = cot_decode_speculative(model, tokenizer, messages, k=2, max_new_tokens=2058)
76
 
77
  # Format the answer as markdown
 
82
  **Score:** {score}
83
  """
84
 
85
+ # Display the answer in markdown format
86
  console.print(Markdown(markdown_answer))
87
  ```
88
 
89
+ **Example Output with k=2:**
90
+
91
  ```text
92
  No, 9.11 is not greater than 9.8. To compare these two numbers, we can look at their decimal places. The number 9.8
93
  can be thought of as 9.80, which makes it easier to compare directly with 9.11. Since 80 is greater than 11, it's
94
+ clear that 9.8 is greater than 9.11.
95
  ```
96
 
97
+ **Step-by-Step Reasoning with k=2:**
 
 
 
98
 
 
 
 
 
 
 
99
  ```text
100
  Certainly! Let's break down the process step by step to determine how many 'K's are in the words "Kingdom" and
101
  "Kith."
 
103
  Step 1: Identify the word "Kingdom"
104
 
105
  • The word "Kingdom" has the following letters: K, I, N, G, D, O, M.
106
+ • Count the number of 'K's in this word: There is only one 'K'.
107
 
108
+ Step 2: Identify the word "Kith"
109
 
110
  • The word "Kith" has the following letters: K, I, T, H.
111
  • Count the number of 'K's in this word: There is only one 'K'.
 
119
 
120
  • There is a total of 2 'K's in both words combined: 1 'K' in "Kingdom" and 1 'K' in "Kith."
121
 
122
+ So, the total number of 'K's in the words "Kingdom" and "Kith" is 2.
123
  ```
124
 
125
+ ## Advanced Usage
126
 
127
+ For optimal reasoning performance, it is recommended to use **BF16** precision and the [Entropic Chain of Thought](https://huggingface.co/sethuiyer/Qwen2.5-7B-Anvita/blob/main/entropic_cot.py) decoding method. This experimental decoder combines entropy and CoT decoding to enhance output quality.
128
 
129
+ ```python
130
+ from transformers import AutoTokenizer, AutoModelForCausalLM
131
+ from rich.console import Console
132
+ from rich.markdown import Markdown
133
+
134
+ console = Console()
135
+ MODEL_PATH = "sethuiyer/Qwen2.5-7B-Anvita"
136
+
137
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
138
+ model = AutoModelForCausalLM.from_pretrained(MODEL_PATH).to("cuda")
139
+
140
+ QUESTION = "How many 'K's are there in the words 'Kingdom' and 'Kith'?"
141
+ messages = [
142
+ {"role": "user", "content": QUESTION}
143
+ ]
144
+
145
+ # Generate the answer with Entropic Chain of Thought decoding
146
+ answer, score = cot_decode_speculative(model, tokenizer, messages, k=2, max_new_tokens=2058)
147
+
148
+ # Display the formatted answer
149
+ markdown_answer = f"""
150
+ # **Answer:**
151
+ {answer}
152
 
153
+ **Score:** {score}
154
+ """
155
+
156
+ console.print(Markdown(markdown_answer))
157
+ ```
158
+
159
+ ## Configuration
160
+
161
+ The following YAML configuration was used to produce Anvita:
162
+
163
+ ```yaml
164
  slices:
165
+ models:
166
+ - model: bunnycore/Qwen2.5-7B-Matrix
167
+ parameters:
168
+ weight: [0.25, 0.35, 0.45, 0.35, 0.25]
169
+ density: [0.1, 0.25, 0.5, 0.25, 0.1]
170
+ - model: bunnycore/Qwen2.5-7B-HyperMix
171
+ - model: happzy2633/qwen2.5-7b-ins-v3
172
+ parameters:
173
+ weight: [0.55, 0.45, 0.35, 0.45, 0.55]
174
+ density: [0.1, 0.25, 0.5, 0.25, 0.1]
175
  merge_method: dare_ties
176
  base_model: bunnycore/Qwen2.5-7B-HyperMix
177
  parameters:
178
  int8_mask: true
179
  dtype: bfloat16
 
180
  ```
 
181
 
182
+ ## Testimonial
183
 
184
+ ### **Written by GPT-4o**
 
185
 
186
  ---
187
 
188
+ **Anvita** offers a unique blend of **logical rigor** and **creative flair**. She is **versatile**, tackling a broad spectrum of challenges across **mathematics, law, science, programming, and storytelling**. This model excels particularly well in creative writing and logical problem-solving, consistently producing **engaging narratives and structured reasoning chains**.
189
 
190
  However, there are certain areas—such as **symbolic puzzles, detective mysteries, and edge case handling**—that present opportunities for **further improvement**. Through **targeted training and refinement**, Anvita can **unlock even greater potential**, becoming a **dominant force in natural language reasoning models**.
191
 
192
  ---
193
 
194
+ ## Performance Evaluation
195
 
196
+ ### **Key Strengths**
 
 
 
 
 
 
 
197
 
198
+ 1. **Creative Writing**
199
+ - Generates **rich, immersive narratives** across multiple genres, especially excelling in **science fiction, dark fantasy, and character-driven stories**.
200
+ - Ability to **develop coherent plots and engaging dialogue** ensures that creative outputs meet high standards.
 
 
 
 
201
 
202
+ 2. **Logical Reasoning and Problem Solving**
203
+ - Demonstrates strong **multi-step reasoning** across mathematical, legal, and scientific problems.
204
+ - Handles **complex logical structures** effectively, such as **graph theory, probability, and legal scenarios**.
205
 
206
+ 3. **Conversational Fluency**
207
+ - Engages in **context-aware, fluid conversations** that mimic human interaction.
208
+ - Offers insightful takes on abstract topics, such as **existential questions** and **philosophy**.
 
209
 
210
+ 4. **Programmatic Competency**
211
+ - Proficient in generating functional code, especially in **C++ and HolyC**, though minor adjustments are occasionally required.
212
+ - Tackles **algorithmic challenges** with competence, contributing solutions across **mathematics and programming logic**.
213
 
214
+ ### **Areas for Improvement**
 
 
215
 
216
+ 1. **Symbolic Reasoning and Puzzles**
217
+ - Struggles with **abstract symbolic puzzles**, requiring deeper understanding to identify patterns and relationships.
218
+ - Needs refinement in tackling **advanced combinatorics** and interpreting **subtle patterns**.
219
 
220
+ 2. **Detective Mysteries**
221
+ - Competent in generating mystery scenarios but falls short in **crafting surprising twists**, especially the complex deductions associated with **locked-room scenarios**.
222
+ - Additional exposure to **Detective Conan-style reasoning frameworks** would significantly enhance performance.
223
 
224
+ 3. **Handling Edge Cases**
225
+ - Occasionally misses **nuanced edge cases** in graph theory and statistical problems.
226
+ - Would benefit from more **granular handling** of boundary conditions and **edge-specific logic**.
227
 
228
  ---
229
 
230
+ ## Overall Performance Summary
231
+
232
+ - **Overall Score:** 73/100
233
+ - **Tested Domains:** Creative Writing, Logical Reasoning, Symbolic Reasoning, Programming, Mathematics, Law, Scientific Problem-Solving.
234