File size: 14,265 Bytes
181fd24
 
 
 
 
 
 
 
 
 
a355780
 
181fd24
 
 
 
 
a355780
 
181fd24
 
 
 
 
a355780
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181fd24
 
 
 
 
 
 
7e381ea
181fd24
7e381ea
181fd24
 
 
 
 
7e381ea
 
181fd24
7e381ea
181fd24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a355780
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
---
language:
- en
- de
- fr
- it
- pt
- hi
- es
- th
license: llama3.1
library_name: transformers
tags:
- reflection
- unsloth
- peft
- llama
base_model:
- meta-llama/Meta-Llama-3.1-8B-Instruct
datasets:
- leafspark/DetailedReflection-Claude-v3_5-Sonnet
metrics:
- accuracy
pipeline_tag: text-generation
model-index:
- name: Llama-3.1-8B-MultiReflection-Instruct
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: IFEval (0-Shot)
      type: HuggingFaceH4/ifeval
      args:
        num_few_shot: 0
    metrics:
    - type: inst_level_strict_acc and prompt_level_strict_acc
      value: 71.25
      name: strict accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=leafspark/Llama-3.1-8B-MultiReflection-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: BBH (3-Shot)
      type: BBH
      args:
        num_few_shot: 3
    metrics:
    - type: acc_norm
      value: 28.45
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=leafspark/Llama-3.1-8B-MultiReflection-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MATH Lvl 5 (4-Shot)
      type: hendrycks/competition_math
      args:
        num_few_shot: 4
    metrics:
    - type: exact_match
      value: 12.54
      name: exact match
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=leafspark/Llama-3.1-8B-MultiReflection-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GPQA (0-shot)
      type: Idavidrein/gpqa
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 5.7
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=leafspark/Llama-3.1-8B-MultiReflection-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MuSR (0-shot)
      type: TAUR-Lab/MuSR
      args:
        num_few_shot: 0
    metrics:
    - type: acc_norm
      value: 8.52
      name: acc_norm
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=leafspark/Llama-3.1-8B-MultiReflection-Instruct
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU-PRO (5-shot)
      type: TIGER-Lab/MMLU-Pro
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 30.27
      name: accuracy
    source:
      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=leafspark/Llama-3.1-8B-MultiReflection-Instruct
      name: Open LLM Leaderboard
---

# Llama-3.1-8B-MultiReflection-Instruct

This model is inspired by OpenAI's o1 reasoning model. The dataset was synthetically generated using Claude 3.5 Sonnet.

### Training
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/pGUUKI77dP9NBVdLf1zB0.png)

The training was done on Google Colab's free T4, using unsloth (duration: 52.32 minutes). The configuration is as follows:
- LoRA Rank: 128
- Packing: enabled
- Batch size: 2
- Gradient accumulation steps: 4
- Epoches: 3
- Steps: 30
- Max sequence length: 4096

The training data comprised of 81 examples, each approximatly 3000 tokens.

### Notes
- It tends to produce very verbose and long reasoning responses
- The coherency seems fairly good if you used the recommended sampling parameters
- You should use at least 16k context, as each response is usually 2000-3000 tokens long

**Recommended Sampling Parameters:**
- Temperature: `0.15`
- Min-P: `0.2`
- Top-K: `50`
- Top-P: `1`
- Max Tokens: `uncapped`
- Frequency Penalty: `0.5`
- Presence Penalty: `0.1`

**Recommended System Prompt:**
```
You are an advanced agentic assistant designed to complete tasks and provide general insight. Your responses will be formatted in XML. Start your responses with a <thinking> XML tag. You should write general process of how to complete the task provided by the user in this area. You may add nested tags, such as multiple <thinking> tags. <reflection> tags may also be used, if you've detected any inconsistencies or errors in your thinking. After completing a detailed thinking process, generate a draft response in <draft> tags. Review this response in <reflection> tags again, and refine your response as needed. Remember that you may (and are encouraged to) include multiple thinking, reflection and drafts. Provide your final response in <output> tags.
```

I initially thought the model was going to be medicore, but here's a sample output (quantized Q6_K, the fp16 is probably even better):
```
Can you teach me a cool math proof?
```
```xml
<thinking>
To create an engaging math proof, I should:
1. Choose a well-known and interesting proof
2. Break it down into simple steps
3. Ask the user questions to keep them engaged
4. Use LaTeX for math symbols
5. Provide explanations and context for each step
</thinking>

<draft>
Let's explore the fascinating world of math proofs together. I'd like to introduce you to the "Proof of the Pythagorean Theorem" - a classic and beautiful result in geometry.

Here's the theorem we'll be proving:

$$a^2 + b^2 = c^2$$

where $a$ and $b$ are the lengths of the legs of a right triangle, and $c$ is the length of the hypotenuse.

To make this proof more engaging, I'll ask you questions along the way. Are you ready to get started?

1. Can you draw a simple right triangle with legs of length $a$ and $b$, and a hypotenuse of length $c$?
</draft>

<reflection>
The draft is a good start, but I should:
1. Provide more context about the proof
2. Explain why this theorem is important
3. Ask more questions to keep the user engaged
4. Use LaTeX for math symbols
</reflection>

<thinking>
Let me revise the draft to address these points:
1. I'll explain why the Pythagorean Theorem is important
2. I'll ask more questions to keep the user engaged
3. I'll use LaTeX for math symbols
4. I'll provide more context about the proof
</thinking>

<draft>
Let's embark on an exciting journey through the world of math proofs. Today, we'll explore the "Proof of the Pythagorean Theorem" - a fundamental result in geometry that has far-reaching implications in various fields, including physics, engineering, and architecture.

Here's the theorem we'll be proving:

$$a^2 + b^2 = c^2$$

where $a$ and $b$ are the lengths of the legs of a right triangle, and $c$ is the length of the hypotenuse.

This theorem is crucial because it allows us to calculate the length of the hypotenuse (the longest side) of a right triangle, given the lengths of the other two sides. It's a powerful tool in many real-world applications, such as building design, bridge construction, and even computer graphics.

To make this proof more engaging, I'll ask you questions along the way. Are you ready to get started?

1. Can you draw a simple right triangle with legs of length $a$ and $b$, and a hypotenuse of length $c$?
2. What do you think might be a good way to approach this proof? Do you have any ideas?
</draft>

<reflection>
The revised draft is much better, but I should:
1. Provide more details about the proof
2. Explain the concept of similar triangles
3. Use more LaTeX for math symbols
4. Ask more questions to keep the user engaged
</reflection>

<thinking>
Let me revise the draft again to address these points:
1. I'll provide more details about the proof
2. I'll explain the concept of similar triangles
3. I'll use more LaTeX for math symbols
4. I'll ask more questions to keep the user engaged
</thinking>

<draft>
Let's dive into the fascinating world of math proofs. Today, we'll explore the "Proof of the Pythagorean Theorem" - a fundamental result in geometry that has far-reaching implications in various fields, including physics, engineering, and architecture.

Here's the theorem we'll be proving:

$$a^2 + b^2 = c^2$$

where $a$ and $b$ are the lengths of the legs of a right triangle, and $c$ is the length of the hypotenuse.

This theorem is crucial because it allows us to calculate the length of the hypotenuse (the longest side) of a right triangle, given the lengths of the other two sides. It's a powerful tool in many real-world applications, such as building design, bridge construction, and even computer graphics.

To make this proof more engaging, I'll ask you questions along the way. Are you ready to get started?

1. Can you draw a simple right triangle with legs of length $a$ and $b$, and a hypotenuse of length $c$?
2. What do you think might be a good way to approach this proof? Do you have any ideas?

Let's start by considering two similar triangles. Similar triangles are triangles that have the same shape, but not necessarily the same size. In this case, we'll use the concept of similar triangles to relate the lengths of the sides of the right triangle.

Here's a diagram of the two similar triangles:

$$\begin{array}{ccc}
\text{Triangle } ABC & \text{Triangle } ADE \\
a & \frac{a}{2} \\
b & \frac{b}{2} \\
c & \frac{c}{2} \\
\end{array}$$

Notice that the two triangles are similar because they have the same angles (all right angles in this case). This means that their corresponding sides are proportional.

Now, let's use this similarity to relate the lengths of the sides of the two triangles. We can set up a proportion based on the corresponding sides:

$$\frac{a}{\frac{a}{2}} = \frac{b}{\frac{b}{2}} = \frac{c}{\frac{c}{2}}$$

Simplifying this proportion, we get:

$$2a = \frac{a}{2}$$

$$2b = \frac{b}{2}$$

$$2c = \frac{c}{2}$$

Now, let's square both sides of each equation:

$$4a^2 = \frac{a^2}{4}$$

$$4b^2 = \frac{b^2}{4}$$

$$4c^2 = \frac{c^2}{4}$$

Multiplying both sides of each equation by 4, we get:

$$16a^2 = a^2$$

$$16b^2 = b^2$$

$$16c^2 = c^2$$

Now, let's add the three equations together:

$$16a^2 + 16b^2 + 16c^2 = a^2 + b^2 + c^2$$

Dividing both sides by 16, we get:

$$a^2 + b^2 + c^2 = \frac{1}{16}(a^2 + b^2 + c^2)$$

This equation is true for any right triangle, so we can conclude that:

$$a^2 + b^2 = c^2$$

This is the Pythagorean Theorem!

Now, it's your turn again. Can you think of any real-world applications of this theorem? Do you have any questions about the proof?
</draft>

<reflection>
The revised draft is excellent. It provides a clear and engaging explanation of the Pythagorean Theorem, along with a step-by-step proof. The use of LaTeX for math symbols is also well done. I'm confident that this will be a helpful and enjoyable experience for the user.
</reflection>

<output>
Let's embark on an exciting journey through the world of math proofs. Today, we'll explore the "Proof of the Pythagorean Theorem" - a fundamental result in geometry that has far-reaching implications in various fields, including physics, engineering, and architecture.

Here's the theorem we'll be proving:

$$a^2 + b^2 = c^2$$

where $a$ and $b$ are the lengths of the legs of a right triangle, and $c$ is the length of the hypotenuse.

This theorem is crucial because it allows us to calculate the length of the hypotenuse (the longest side) of a right triangle, given the lengths of the other two sides. It's a powerful tool in many real-world applications, such as building design, bridge construction, and even computer graphics.

To make this proof more engaging, I'll ask you questions along the way. Are you ready to get started?

1. Can you draw a simple right triangle with legs of length $a$ and $b$, and a hypotenuse of length $c$?
2. What do you think might be a good way to approach this proof? Do you have any ideas?

Let's start by considering two similar triangles. Similar triangles are triangles that have the same shape, but not necessarily the same size. In this case, we'll use the concept of similar triangles to relate the lengths of the sides of the right triangle.

Here's a diagram of the two similar triangles:

$$\begin{array}{ccc}
\text{Triangle } ABC & \text{Triangle } ADE \\
a & \frac{a}{2} \\
b & \frac{b}{2} \\
c & \frac{c}{2} \\
\end{array}$$

Notice that the two triangles are similar because they have the same angles (all right angles in this case). This means that their corresponding sides are proportional.

Now, let's use this similarity to relate the lengths of the sides of the two triangles. We can set up a proportion based on the corresponding sides:

$$\frac{a}{\frac{a}{2}} = \frac{b}{\frac{b}{2}} = \frac{c}{\frac{c}{2}}$$

Simplifying this proportion, we get:

$$2a = \frac{a}{2}$$

$$2b = \frac{b}{2}$$

$$2c = \frac{c}{2}$$

Now, let's square both sides of each equation:

$$4a^2 = \frac{a^2}{4}$$

$$4b^2 = \frac{b^2}{4}$$

$$4c^2 = \frac{c^2}{4}$$

Multiplying both sides of each equation by 4, we get:

$$16a^2 = a^2$$

$$16b^2 = b^2$$

$$16c^2 = c^2$$

Now, let's add the three equations together:

$$16a^2 + 16b^2 + 16c^2 = a^2 + b^2 + c^2$$

Dividing both sides by 16, we get:

$$a^2 + b^2 + c^2 = \frac{1}{16}(a^2 + b^2 + c^2)$$

This equation is true for any right triangle, so we can conclude that:

$$a^2 + b^2 = c^2$$

This is the Pythagorean Theorem!

Now, it's your turn again. Can you think of any real-world applications of this theorem? Do you have any questions about the proof?

Remember, this is a proof, not a mathematical exercise. Feel free to ask questions or share your thoughts about the theorem and its implications.
</output>
```

# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_leafspark__Llama-3.1-8B-MultiReflection-Instruct)

|      Metric       |Value|
|-------------------|----:|
|Avg.               |26.12|
|IFEval (0-Shot)    |71.25|
|BBH (3-Shot)       |28.45|
|MATH Lvl 5 (4-Shot)|12.54|
|GPQA (0-shot)      | 5.70|
|MuSR (0-shot)      | 8.52|
|MMLU-PRO (5-shot)  |30.27|