Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model:
|
3 |
+
- Meta-Llama-3.1-8B-Instruct
|
4 |
+
tags:
|
5 |
+
- merge
|
6 |
+
- mergekit
|
7 |
+
license: llama3.1
|
8 |
+
language:
|
9 |
+
- en
|
10 |
+
- de
|
11 |
+
---
|
12 |
+
|
13 |
+
# llama3.1-8b-spaetzle-v51
|
14 |
+
|
15 |
+
This is only a quick test in merging 3 and 3.1 llamas despite a number of differences in tokenizer setup i.a., also motivated by ongoing problems with BOS, looping, etc, with 3.1, esp. with llama.cpp, missing full RoPE scaling yet, etc. Performance is yet not satisfactory of course, which might have a number of causes.
|
16 |
+
|
17 |
+
GGUF is (for another test purpose) done with old llama.cpp binary (b2750) and
|
18 |
+
``` code
|
19 |
+
--leave-output-tensor --token-embedding-type f16.
|
20 |
+
```
|
21 |
+
|
22 |
+
### Summary Table
|
23 |
+
|
24 |
+
| Model | AGIEval | TruthfulQA | Bigbench |
|
25 |
+
|----------------------------------------------------------------------------|--------:|-----------:|---------:|
|
26 |
+
| [llama3.1-8b-spaetzle-v51](https://huggingface.co/cstr/llama3-8b-spaetzle-v51)| 42.23 | 57.29 | 44.3 |
|
27 |
+
| [llama3-8b-spaetzle-v39](https://huggingface.co/cstr/llama3-8b-spaetzle-v39)| 43.43 | 60.0 | 45.89 |
|
28 |
+
|
29 |
+
### AGIEval Results
|
30 |
+
|
31 |
+
| Task | llama3.1-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
|
32 |
+
|------------------------------|-----------------------:|-----------------------:|
|
33 |
+
| agieval_aqua_rat | 27.95| 24.41|
|
34 |
+
| agieval_logiqa_en | 38.10| 37.94|
|
35 |
+
| agieval_lsat_ar | 24.78| 22.17|
|
36 |
+
| agieval_lsat_lr | 42.94| 45.29|
|
37 |
+
| agieval_lsat_rc | 59.11| 62.08|
|
38 |
+
| agieval_sat_en | 68.45| 71.36|
|
39 |
+
| agieval_sat_en_without_passage| 38.35| 44.17|
|
40 |
+
| agieval_sat_math | 38.18| 40.00|
|
41 |
+
| **Average** | 42.23| 43.43|
|
42 |
+
|
43 |
+
### TruthfulQA Results
|
44 |
+
|
45 |
+
| Task | llama3.1-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
|
46 |
+
|-------------|-----------------------:|-----------------------:|
|
47 |
+
| mc1 | 38.07| 43.82|
|
48 |
+
| mc2 | 57.29| 60.00|
|
49 |
+
| **Average** | 57.29| 60.00|
|
50 |
+
|
51 |
+
### Bigbench Results
|
52 |
+
|
53 |
+
| Task | llama3.1-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
|
54 |
+
|------------------------------------------------|-----------------------:|-----------------------:|
|
55 |
+
| bigbench_causal_judgement | 56.32| 59.47|
|
56 |
+
| bigbench_date_understanding | 69.65| 70.73|
|
57 |
+
| bigbench_disambiguation_qa | 31.40| 34.88|
|
58 |
+
| bigbench_geometric_shapes | 29.81| 24.23|
|
59 |
+
| bigbench_logical_deduction_five_objects | 30.20| 36.20|
|
60 |
+
| bigbench_logical_deduction_seven_objects | 23.00| 24.00|
|
61 |
+
| bigbench_logical_deduction_three_objects | 55.67| 65.00|
|
62 |
+
| bigbench_movie_recommendation | 33.00| 36.20|
|
63 |
+
| bigbench_navigate | 55.10| 51.70|
|
64 |
+
| bigbench_reasoning_about_colored_objects | 66.55| 68.60|
|
65 |
+
| bigbench_ruin_names | 52.23| 51.12|
|
66 |
+
| bigbench_salient_translation_error_detection | 25.55| 28.96|
|
67 |
+
| bigbench_snarks | 61.88| 62.43|
|
68 |
+
| bigbench_sports_understanding | 51.42| 53.96|
|
69 |
+
| bigbench_temporal_sequences | 59.30| 53.60|
|
70 |
+
| bigbench_tracking_shuffled_objects_five_objects| 23.28| 22.32|
|
71 |
+
| bigbench_tracking_shuffled_objects seven objects| 17.31| 17.66|
|
72 |
+
| bigbench_tracking_shuffled_objects three objects| 55.67| 65.00|
|
73 |
+
| **Average** | 44.30| 45.89|
|
74 |
+
|
75 |
+
(GPT4All run broke.)
|
76 |
+
|
77 |
+
## 🧩 Configuration
|
78 |
+
|
79 |
+
```yaml
|
80 |
+
models:
|
81 |
+
- model: cstr/llama3-8b-spaetzle-v34
|
82 |
+
# no parameters necessary for base model
|
83 |
+
- model: sparsh35/Meta-Llama-3.1-8B-Instruct
|
84 |
+
parameters:
|
85 |
+
density: 0.65
|
86 |
+
weight: 0.5
|
87 |
+
merge_method: dare_ties
|
88 |
+
base_model: cstr/llama3-8b-spaetzle-v34
|
89 |
+
parameters:
|
90 |
+
int8_mask: true
|
91 |
+
dtype: bfloat16
|
92 |
+
random_seed: 0
|
93 |
+
tokenizer_source: base
|
94 |
+
```
|
95 |
+
|
96 |
+
## 💻 Usage
|
97 |
+
|
98 |
+
```python
|
99 |
+
!pip install -qU transformers accelerate
|
100 |
+
|
101 |
+
from transformers import AutoTokenizer
|
102 |
+
import transformers
|
103 |
+
import torch
|
104 |
+
|
105 |
+
model = "cstr/llama3-8b-spaetzle-v51"
|
106 |
+
messages = [{"role": "user", "content": "What is a large language model?"}]
|
107 |
+
|
108 |
+
tokenizer = AutoTokenizer.from_pretrained(model)
|
109 |
+
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
110 |
+
pipeline = transformers.pipeline(
|
111 |
+
"text-generation",
|
112 |
+
model=model,
|
113 |
+
torch_dtype=torch.float16,
|
114 |
+
device_map="auto",
|
115 |
+
)
|
116 |
+
|
117 |
+
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
|
118 |
+
print(outputs[0]["generated_text"])
|
119 |
+
```
|