Update README.md
Browse files
README.md
CHANGED
@@ -9,16 +9,17 @@ tags: []
|
|
9 |
|
10 |
Note: These results are with corrected parsing for BBH from Eleuther's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). See [this PR](https://github.com/EleutherAI/lm-evaluation-harness/pull/2013).
|
11 |
|
12 |
-
### Smaug-Qwen2-72B-Instruct
|
13 |
-
|
14 |
#### Overall:
|
15 |
|
16 |
-
|Groups|Version| Filter |n-shot| Metric | |Value | |Stderr|
|
17 |
-
|
18 |
-
|bbh |N/A |get-answer| 3|exact_match|↑ |0.8241|± |0.0042|
|
|
|
19 |
|
20 |
#### Breakdown:
|
21 |
|
|
|
|
|
22 |
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|
23 |
|----------------------------------------------------------|-------|----------|-----:|-----------|---|-----:|---|-----:|
|
24 |
|bbh |N/A |get-answer| 3|exact_match|↑ |0.8241|± |0.0042|
|
@@ -50,15 +51,7 @@ Note: These results are with corrected parsing for BBH from Eleuther's [lm-evalu
|
|
50 |
| - bbh_cot_fewshot_web_of_lies | 2|get-answer| 3|exact_match|↑ |1.0000|± |0.0000|
|
51 |
| - bbh_cot_fewshot_word_sorting | 2|get-answer| 3|exact_match|↑ |0.6560|± |0.0301|
|
52 |
|
53 |
-
|
54 |
-
|
55 |
-
#### Overall:
|
56 |
-
|
57 |
-
|Groups|Version| Filter |n-shot| Metric | |Value | |Stderr|
|
58 |
-
|------|-------|----------|-----:|-----------|---|-----:|---|-----:|
|
59 |
-
|bbh |N/A |get-answer| 3|exact_match|↑ |0.8036|± |0.0044|
|
60 |
-
|
61 |
-
#### Breakdown:
|
62 |
|
63 |
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|
64 |
|----------------------------------------------------------|-------|----------|-----:|-----------|---|-----:|---|-----:|
|
|
|
9 |
|
10 |
Note: These results are with corrected parsing for BBH from Eleuther's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). See [this PR](https://github.com/EleutherAI/lm-evaluation-harness/pull/2013).
|
11 |
|
|
|
|
|
12 |
#### Overall:
|
13 |
|
14 |
+
|Model|Groups|Version| Filter |n-shot| Metric | |Value | |Stderr|
|
15 |
+
|------||------|-------|----------|-----:|-----------|---|-----:|---|-----:|
|
16 |
+
|Smaug-Qwen2-72B-Instruct|bbh |N/A |get-answer| 3|exact_match|↑ |0.8241|± |0.0042|
|
17 |
+
|Qwen2-72B-Instruct|bbh |N/A |get-answer| 3|exact_match|↑ |0.8036|± |0.0044|
|
18 |
|
19 |
#### Breakdown:
|
20 |
|
21 |
+
Smaug-Qwen2-72B-Instruct:
|
22 |
+
|
23 |
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|
24 |
|----------------------------------------------------------|-------|----------|-----:|-----------|---|-----:|---|-----:|
|
25 |
|bbh |N/A |get-answer| 3|exact_match|↑ |0.8241|± |0.0042|
|
|
|
51 |
| - bbh_cot_fewshot_web_of_lies | 2|get-answer| 3|exact_match|↑ |1.0000|± |0.0000|
|
52 |
| - bbh_cot_fewshot_word_sorting | 2|get-answer| 3|exact_match|↑ |0.6560|± |0.0301|
|
53 |
|
54 |
+
Qwen2-72B-Instruct:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|
57 |
|----------------------------------------------------------|-------|----------|-----:|-----------|---|-----:|---|-----:|
|