ArkaAbacus commited on
Commit
6deb8e6
·
verified ·
1 Parent(s): 84ccd91

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -14
README.md CHANGED
@@ -9,16 +9,17 @@ tags: []
9
 
10
  Note: These results are with corrected parsing for BBH from Eleuther's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). See [this PR](https://github.com/EleutherAI/lm-evaluation-harness/pull/2013).
11
 
12
- ### Smaug-Qwen2-72B-Instruct
13
-
14
  #### Overall:
15
 
16
- |Groups|Version| Filter |n-shot| Metric | |Value | |Stderr|
17
- |------|-------|----------|-----:|-----------|---|-----:|---|-----:|
18
- |bbh |N/A |get-answer| 3|exact_match|↑ |0.8241|± |0.0042|
 
19
 
20
  #### Breakdown:
21
 
 
 
22
  | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
23
  |----------------------------------------------------------|-------|----------|-----:|-----------|---|-----:|---|-----:|
24
  |bbh |N/A |get-answer| 3|exact_match|↑ |0.8241|± |0.0042|
@@ -50,15 +51,7 @@ Note: These results are with corrected parsing for BBH from Eleuther's [lm-evalu
50
  | - bbh_cot_fewshot_web_of_lies | 2|get-answer| 3|exact_match|↑ |1.0000|± |0.0000|
51
  | - bbh_cot_fewshot_word_sorting | 2|get-answer| 3|exact_match|↑ |0.6560|± |0.0301|
52
 
53
- ### Qwen2-72B-Instruct
54
-
55
- #### Overall:
56
-
57
- |Groups|Version| Filter |n-shot| Metric | |Value | |Stderr|
58
- |------|-------|----------|-----:|-----------|---|-----:|---|-----:|
59
- |bbh |N/A |get-answer| 3|exact_match|↑ |0.8036|± |0.0044|
60
-
61
- #### Breakdown:
62
 
63
  | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
64
  |----------------------------------------------------------|-------|----------|-----:|-----------|---|-----:|---|-----:|
 
9
 
10
  Note: These results are with corrected parsing for BBH from Eleuther's [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). See [this PR](https://github.com/EleutherAI/lm-evaluation-harness/pull/2013).
11
 
 
 
12
  #### Overall:
13
 
14
+ |Model|Groups|Version| Filter |n-shot| Metric | |Value | |Stderr|
15
+ |------||------|-------|----------|-----:|-----------|---|-----:|---|-----:|
16
+ |Smaug-Qwen2-72B-Instruct|bbh |N/A |get-answer| 3|exact_match|↑ |0.8241|± |0.0042|
17
+ |Qwen2-72B-Instruct|bbh |N/A |get-answer| 3|exact_match|↑ |0.8036|± |0.0044|
18
 
19
  #### Breakdown:
20
 
21
+ Smaug-Qwen2-72B-Instruct:
22
+
23
  | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
24
  |----------------------------------------------------------|-------|----------|-----:|-----------|---|-----:|---|-----:|
25
  |bbh |N/A |get-answer| 3|exact_match|↑ |0.8241|± |0.0042|
 
51
  | - bbh_cot_fewshot_web_of_lies | 2|get-answer| 3|exact_match|↑ |1.0000|± |0.0000|
52
  | - bbh_cot_fewshot_word_sorting | 2|get-answer| 3|exact_match|↑ |0.6560|± |0.0301|
53
 
54
+ Qwen2-72B-Instruct:
 
 
 
 
 
 
 
 
55
 
56
  | Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
57
  |----------------------------------------------------------|-------|----------|-----:|-----------|---|-----:|---|-----:|