kenhktsui commited on
Commit
36c5712
1 Parent(s): 0f1b38c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -1
README.md CHANGED
@@ -10,7 +10,151 @@ tags:
10
  license: apache-2.0
11
  language:
12
  - en
 
 
13
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  # Uploaded model
16
 
@@ -20,4 +164,4 @@ language:
20
 
21
  This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
22
 
23
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
10
  license: apache-2.0
11
  language:
12
  - en
13
+ datasets:
14
+ - kenhktsui/longtalk-cot-v0.1
15
  ---
16
+ # kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged
17
+
18
+ With 1 epoch of SFT, it boosts processbench GSM-8k subset by 26%. However, it results in worse result in GSM-8k.
19
+ It seems to be either 1) the metrics are not measuring correctly due to parsing or 2) sign of overfitting answer, as it is not able to critque reasoning step.
20
+
21
+ ||[kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged](https://huggingface.co/kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged)|[Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)|
22
+ |---|---|---|
23
+ |ProcessBench (GSM-8k)|**46.0**|36.5 (reported)|
24
+ |GSM-8k|78.54|**84.23**|
25
+
26
+ This needs to be investigated further.
27
+ I have done inspection of how **model behavior changed** after SFT. As expected, the finetuned model thinks much longer, and often solve a problem with multiple approaches instead of one.
28
+
29
+ ### Quick Test 1: Is 9.11 greater than 9.9?
30
+ <details>
31
+ <summary> response from kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf </summary>
32
+ No, 9.11 is not greater than 9.9. Let's think about this step by step.
33
+
34
+ First, it helps to understand that both numbers are decimal numbers, specifically they have two digits after the decimal point. So, 9.11 and 9.9 can be compared directly based on their
35
+ values in each place value: whole number part and the tenths and hundredths parts.
36
+
37
+ Let's break them down:
38
+
39
+ - For 9.11:
40
+ - The whole number part is 9.
41
+ - The tenths place is 1 (which means one tenth).
42
+ - The hundredths place is also 1 (one hundredth).
43
+
44
+ - For 9.9:
45
+ - The whole number part is also 9.
46
+ - But here, the tenths place is 9 (nine tenths), and there's no hundredths digit after that; it's considered to have zero in the hundredths place because decimal numbers with fewer digits
47
+ are filled with zeros.
48
+
49
+ So, comparing them:
50
+
51
+ - Both have the same whole number part: 9.
52
+ - For the tenth place:
53
+ - 1 vs. 9.
54
+
55
+ Since 9 is greater than 1, this already tells us that 9.9 is larger than 9.11. But to be thorough, let's look at the hundredths places as well:
56
+
57
+ - In 9.11: hundredths = 1
58
+ - In 9.9: hundredths = 0
59
+
60
+ Even though there are no digits in the hundredths place for 9.9, it's still considered zero because of the way decimals work.
61
+
62
+ Therefore, overall, 9.11 < 9.9.
63
+
64
+ Wait a minute, but I should make sure that I'm not missing anything here. Sometimes, people might think about adding or subtracting numbers differently, but in this case, since both numbers
65
+ have the same whole number part and one has a larger tenths digit, it's clear which is larger.
66
+
67
+ Another way to look at it is to consider them as percentages:
68
+
69
+ - 9.11 = 911%
70
+ - 9.9 = 990%
71
+
72
+ But that might be confusing; in percentage terms, 990% is still greater than 911%.
73
+
74
+ Alternatively, if we multiply both numbers by 100 to eliminate the decimal point:
75
+
76
+ - 9.11 * 100 = 911
77
+ - 9.9 * 100 = 990
78
+
79
+ Again, 990 is greater than 911.
80
+
81
+ So, no matter how we look at it, 9.9 is larger than 9.11.
82
+
83
+ I think that covers the comparison thoroughly. There doesn't seem to be any trick or special case here; it's a straightforward decimal comparison once you understand place values.
84
+
85
+ **Final Answer**
86
+
87
+ \[ \boxed{9.11 < 9.9} \]
88
+ </details>
89
+
90
+ <details>
91
+ <summary> response from response from Qwen/Qwen2.5-7B-Instruct-GGUF </summary>
92
+ No, 9.11 is not greater than 9.9. When comparing decimal numbers, you can compare them digit by digit from left to right. In this case:
93
+
94
+ - The whole number part of both numbers is the same (9).
95
+ - Comparing the tenths place, we see that 1 in 9.11 is larger than 9 in 9.9.
96
+ - Therefore, 9.11 > 9.9.
97
+
98
+ So, 9.11 is greater than 9.9.
99
+ </details>
100
+
101
+
102
+ ### Quick Test 2: How many letters R in the word strawberry?
103
+ <details>
104
+ <summary> response from kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf </summary>
105
+ Let's see. The word is "strawberry." I need to count how many 'R' letters are there in this word.
106
+
107
+ First, let me write down the word again: s-t-r-a-w-b-e-r-r-y.
108
+
109
+ Now, I'll go through each letter and check if it's an 'R'.
110
+
111
+ Starting with the first letter: S - that's not R.
112
+
113
+ Second letter: T - also not R.
114
+
115
+ Third letter: R - yes, that's an 'R'. So, one so far.
116
+
117
+ Fourth letter: A - nope.
118
+
119
+ Fifth letter: W - still not R.
120
+
121
+ Sixth letter: B - not R either.
122
+
123
+ Seventh letter: E - no.
124
+
125
+ Eighth letter: R - another 'R'! That makes two now.
126
+
127
+ Ninth letter: B - no R here.
128
+
129
+ Tenth letter: E - no R.
130
+
131
+ Eleventh letter: R - yet again, an 'R'. Now it's three.
132
+
133
+ Twelfth letter: Y - not R.
134
+
135
+ So, counting all the way through, I've found three 'R' letters in "strawberry."
136
+
137
+ Wait a minute, let me double-check. Maybe there are more 'R's that I missed.
138
+
139
+ Looking at "s-t-r-a-w-b-e-r-r-y." Yes, I did find three: 3rd, 8th, and 11th positions.
140
+
141
+ I think that's it. There don't seem to be any other 'R's in the word.
142
+
143
+ Therefore, there are 3 letters R in the word "strawberry."
144
+
145
+ **Final Answer**
146
+
147
+ \[ \boxed{3} \]
148
+ </details>
149
+ <details>
150
+ <summary> response from Qwen/Qwen2.5-7B-Instruct-GGUF </summary>
151
+ The letter "R" appears 1 time in the word "strawberry".
152
+ </details>
153
+
154
+ ### Try out in Ollama!
155
+ ```
156
+ ollama run hf.co/kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf
157
+ ```
158
 
159
  # Uploaded model
160
 
 
164
 
165
  This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
166
 
167
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)