cstr commited on
Commit
a8ddb0f
1 Parent(s): df80b53

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +119 -0
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Meta-Llama-3.1-8B-Instruct
4
+ tags:
5
+ - merge
6
+ - mergekit
7
+ license: llama3.1
8
+ language:
9
+ - en
10
+ - de
11
+ ---
12
+
13
+ # llama3.1-8b-spaetzle-v51
14
+
15
+ This is only a quick test in merging 3 and 3.1 llamas despite a number of differences in tokenizer setup i.a., also motivated by ongoing problems with BOS, looping, etc, with 3.1, esp. with llama.cpp, missing full RoPE scaling yet, etc. Performance is yet not satisfactory of course, which might have a number of causes.
16
+
17
+ GGUF is (for another test purpose) done with old llama.cpp binary (b2750) and
18
+ ``` code
19
+ --leave-output-tensor --token-embedding-type f16.
20
+ ```
21
+
22
+ ### Summary Table
23
+
24
+ | Model | AGIEval | TruthfulQA | Bigbench |
25
+ |----------------------------------------------------------------------------|--------:|-----------:|---------:|
26
+ | [llama3.1-8b-spaetzle-v51](https://huggingface.co/cstr/llama3-8b-spaetzle-v51)| 42.23 | 57.29 | 44.3 |
27
+ | [llama3-8b-spaetzle-v39](https://huggingface.co/cstr/llama3-8b-spaetzle-v39)| 43.43 | 60.0 | 45.89 |
28
+
29
+ ### AGIEval Results
30
+
31
+ | Task | llama3.1-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
32
+ |------------------------------|-----------------------:|-----------------------:|
33
+ | agieval_aqua_rat | 27.95| 24.41|
34
+ | agieval_logiqa_en | 38.10| 37.94|
35
+ | agieval_lsat_ar | 24.78| 22.17|
36
+ | agieval_lsat_lr | 42.94| 45.29|
37
+ | agieval_lsat_rc | 59.11| 62.08|
38
+ | agieval_sat_en | 68.45| 71.36|
39
+ | agieval_sat_en_without_passage| 38.35| 44.17|
40
+ | agieval_sat_math | 38.18| 40.00|
41
+ | **Average** | 42.23| 43.43|
42
+
43
+ ### TruthfulQA Results
44
+
45
+ | Task | llama3.1-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
46
+ |-------------|-----------------------:|-----------------------:|
47
+ | mc1 | 38.07| 43.82|
48
+ | mc2 | 57.29| 60.00|
49
+ | **Average** | 57.29| 60.00|
50
+
51
+ ### Bigbench Results
52
+
53
+ | Task | llama3.1-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
54
+ |------------------------------------------------|-----------------------:|-----------------------:|
55
+ | bigbench_causal_judgement | 56.32| 59.47|
56
+ | bigbench_date_understanding | 69.65| 70.73|
57
+ | bigbench_disambiguation_qa | 31.40| 34.88|
58
+ | bigbench_geometric_shapes | 29.81| 24.23|
59
+ | bigbench_logical_deduction_five_objects | 30.20| 36.20|
60
+ | bigbench_logical_deduction_seven_objects | 23.00| 24.00|
61
+ | bigbench_logical_deduction_three_objects | 55.67| 65.00|
62
+ | bigbench_movie_recommendation | 33.00| 36.20|
63
+ | bigbench_navigate | 55.10| 51.70|
64
+ | bigbench_reasoning_about_colored_objects | 66.55| 68.60|
65
+ | bigbench_ruin_names | 52.23| 51.12|
66
+ | bigbench_salient_translation_error_detection | 25.55| 28.96|
67
+ | bigbench_snarks | 61.88| 62.43|
68
+ | bigbench_sports_understanding | 51.42| 53.96|
69
+ | bigbench_temporal_sequences | 59.30| 53.60|
70
+ | bigbench_tracking_shuffled_objects_five_objects| 23.28| 22.32|
71
+ | bigbench_tracking_shuffled_objects seven objects| 17.31| 17.66|
72
+ | bigbench_tracking_shuffled_objects three objects| 55.67| 65.00|
73
+ | **Average** | 44.30| 45.89|
74
+
75
+ (GPT4All run broke.)
76
+
77
+ ## 🧩 Configuration
78
+
79
+ ```yaml
80
+ models:
81
+ - model: cstr/llama3-8b-spaetzle-v34
82
+ # no parameters necessary for base model
83
+ - model: sparsh35/Meta-Llama-3.1-8B-Instruct
84
+ parameters:
85
+ density: 0.65
86
+ weight: 0.5
87
+ merge_method: dare_ties
88
+ base_model: cstr/llama3-8b-spaetzle-v34
89
+ parameters:
90
+ int8_mask: true
91
+ dtype: bfloat16
92
+ random_seed: 0
93
+ tokenizer_source: base
94
+ ```
95
+
96
+ ## 💻 Usage
97
+
98
+ ```python
99
+ !pip install -qU transformers accelerate
100
+
101
+ from transformers import AutoTokenizer
102
+ import transformers
103
+ import torch
104
+
105
+ model = "cstr/llama3-8b-spaetzle-v51"
106
+ messages = [{"role": "user", "content": "What is a large language model?"}]
107
+
108
+ tokenizer = AutoTokenizer.from_pretrained(model)
109
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
110
+ pipeline = transformers.pipeline(
111
+ "text-generation",
112
+ model=model,
113
+ torch_dtype=torch.float16,
114
+ device_map="auto",
115
+ )
116
+
117
+ outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
118
+ print(outputs[0]["generated_text"])
119
+ ```