cstr commited on
Commit
3b08f95
1 Parent(s): d7fd593

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -14,6 +14,62 @@ language:
14
 
15
  This is only a quick test in merging 3 and 3.1 llamas despite a number of differences in tokenizer setup i.a., also motivated by ongoing problems with BOS, looping, etc, with 3.1, esp. with llama.cpp, missing full RoPE scaling yet, etc. Performance is yet not satisfactory of course, which might have a number of causes.
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## 🧩 Configuration
18
 
19
  ```yaml
 
14
 
15
  This is only a quick test in merging 3 and 3.1 llamas despite a number of differences in tokenizer setup i.a., also motivated by ongoing problems with BOS, looping, etc, with 3.1, esp. with llama.cpp, missing full RoPE scaling yet, etc. Performance is yet not satisfactory of course, which might have a number of causes.
16
 
17
+
18
+ ### Summary Table
19
+
20
+ | Model | AGIEval | TruthfulQA | Bigbench |
21
+ |----------------------------------------------------------------------------|--------:|-----------:|---------:|
22
+ | [llama3-8b-spaetzle-v51](https://huggingface.co/cstr/llama3-8b-spaetzle-v51)| 42.23 | 57.29 | 44.3 |
23
+ | [llama3-8b-spaetzle-v39](https://huggingface.co/cstr/llama3-8b-spaetzle-v39)| 43.43 | 60.0 | 45.89 |
24
+
25
+ ### AGIEval Results
26
+
27
+ | Task | llama3-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
28
+ |------------------------------|-----------------------:|-----------------------:|
29
+ | agieval_aqua_rat | 27.95| 24.41|
30
+ | agieval_logiqa_en | 38.10| 37.94|
31
+ | agieval_lsat_ar | 24.78| 22.17|
32
+ | agieval_lsat_lr | 42.94| 45.29|
33
+ | agieval_lsat_rc | 59.11| 62.08|
34
+ | agieval_sat_en | 68.45| 71.36|
35
+ | agieval_sat_en_without_passage| 38.35| 44.17|
36
+ | agieval_sat_math | 38.18| 40.00|
37
+ | **Average** | 42.23| 43.43|
38
+
39
+ ### TruthfulQA Results
40
+
41
+ | Task | llama3-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
42
+ |-------------|-----------------------:|-----------------------:|
43
+ | mc1 | 38.07| 43.82|
44
+ | mc2 | 57.29| 60.00|
45
+ | **Average** | 57.29| 60.00|
46
+
47
+ ### Bigbench Results
48
+
49
+ | Task | llama3-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
50
+ |------------------------------------------------|-----------------------:|-----------------------:|
51
+ | bigbench_causal_judgement | 56.32| 59.47|
52
+ | bigbench_date_understanding | 69.65| 70.73|
53
+ | bigbench_disambiguation_qa | 31.40| 34.88|
54
+ | bigbench_geometric_shapes | 29.81| 24.23|
55
+ | bigbench_logical_deduction_five_objects | 30.20| 36.20|
56
+ | bigbench_logical_deduction_seven_objects | 23.00| 24.00|
57
+ | bigbench_logical_deduction_three_objects | 55.67| 65.00|
58
+ | bigbench_movie_recommendation | 33.00| 36.20|
59
+ | bigbench_navigate | 55.10| 51.70|
60
+ | bigbench_reasoning_about_colored_objects | 66.55| 68.60|
61
+ | bigbench_ruin_names | 52.23| 51.12|
62
+ | bigbench_salient_translation_error_detection | 25.55| 28.96|
63
+ | bigbench_snarks | 61.88| 62.43|
64
+ | bigbench_sports_understanding | 51.42| 53.96|
65
+ | bigbench_temporal_sequences | 59.30| 53.60|
66
+ | bigbench_tracking_shuffled_objects_five_objects| 23.28| 22.32|
67
+ | bigbench_tracking_shuffled_objects seven objects| 17.31| 17.66|
68
+ | bigbench_tracking_shuffled_objects three objects| 55.67| 65.00|
69
+ | **Average** | 44.30| 45.89|
70
+
71
+ (GPT4All run broke.)
72
+
73
  ## 🧩 Configuration
74
 
75
  ```yaml