File size: 5,449 Bytes
2bd97ed
 
d0ae67c
2bd97ed
 
 
d0ae67c
 
 
 
2bd97ed
 
f212568
2bd97ed
d0ae67c
2bd97ed
3b08f95
 
 
 
 
f212568
3b08f95
 
 
 
f212568
3b08f95
 
 
 
 
 
 
 
 
 
 
 
 
f212568
3b08f95
 
 
 
 
 
 
f212568
3b08f95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2bd97ed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
base_model:
- Meta-Llama-3.1-8B-Instruct
tags:
- merge
- mergekit
license: llama3.1
language:
- en
- de
---

# llama3.1-8b-spaetzle-v51

This is only a quick test in merging 3 and 3.1 llamas despite a number of differences in tokenizer setup i.a., also motivated by ongoing problems with BOS, looping, etc, with 3.1, esp. with llama.cpp, missing full RoPE scaling yet, etc. Performance is yet not satisfactory of course, which might have a number of causes.


### Summary Table

|                                   Model                                    | AGIEval | TruthfulQA | Bigbench |
|----------------------------------------------------------------------------|--------:|-----------:|---------:|
| [llama3.1-8b-spaetzle-v51](https://huggingface.co/cstr/llama3-8b-spaetzle-v51)|   42.23 |      57.29 |    44.3 |
| [llama3-8b-spaetzle-v39](https://huggingface.co/cstr/llama3-8b-spaetzle-v39)|   43.43 |       60.0 |   45.89 |

### AGIEval Results

|             Task             | llama3.1-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
|------------------------------|-----------------------:|-----------------------:|
| agieval_aqua_rat             |                   27.95|                   24.41|
| agieval_logiqa_en            |                   38.10|                   37.94|
| agieval_lsat_ar              |                   24.78|                   22.17|
| agieval_lsat_lr              |                   42.94|                   45.29|
| agieval_lsat_rc              |                   59.11|                   62.08|
| agieval_sat_en               |                   68.45|                   71.36|
| agieval_sat_en_without_passage|                   38.35|                   44.17|
| agieval_sat_math             |                   38.18|                   40.00|
| **Average**                  |                   42.23|                   43.43|

### TruthfulQA Results

|    Task     | llama3.1-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
|-------------|-----------------------:|-----------------------:|
| mc1         |                   38.07|                   43.82|
| mc2         |                   57.29|                   60.00|
| **Average** |                   57.29|                   60.00|

### Bigbench Results

|                      Task                      | llama3.1-8b-spaetzle-v51 | llama3-8b-spaetzle-v39 |
|------------------------------------------------|-----------------------:|-----------------------:|
| bigbench_causal_judgement                      |                   56.32|                   59.47|
| bigbench_date_understanding                    |                   69.65|                   70.73|
| bigbench_disambiguation_qa                     |                   31.40|                   34.88|
| bigbench_geometric_shapes                      |                   29.81|                   24.23|
| bigbench_logical_deduction_five_objects        |                   30.20|                   36.20|
| bigbench_logical_deduction_seven_objects       |                   23.00|                   24.00|
| bigbench_logical_deduction_three_objects       |                   55.67|                   65.00|
| bigbench_movie_recommendation                  |                   33.00|                   36.20|
| bigbench_navigate                              |                   55.10|                   51.70|
| bigbench_reasoning_about_colored_objects       |                   66.55|                   68.60|
| bigbench_ruin_names                            |                   52.23|                   51.12|
| bigbench_salient_translation_error_detection   |                   25.55|                   28.96|
| bigbench_snarks                                |                   61.88|                   62.43|
| bigbench_sports_understanding                  |                   51.42|                   53.96|
| bigbench_temporal_sequences                    |                   59.30|                   53.60|
| bigbench_tracking_shuffled_objects_five_objects|                   23.28|                   22.32|
| bigbench_tracking_shuffled_objects seven objects|                   17.31|                   17.66|
| bigbench_tracking_shuffled_objects three objects|                   55.67|                   65.00|
| **Average**                                    |                   44.30|                   45.89|

(GPT4All run broke.)

## 🧩 Configuration

```yaml
models:
  - model: cstr/llama3-8b-spaetzle-v34
    # no parameters necessary for base model
  - model: sparsh35/Meta-Llama-3.1-8B-Instruct
    parameters:
      density: 0.65
      weight: 0.5
merge_method: dare_ties
base_model: cstr/llama3-8b-spaetzle-v34
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base
```

## 💻 Usage

```python
!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/llama3-8b-spaetzle-v51"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```