cstr commited on
Commit
e57aa63
1 Parent(s): 23f35ae

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - cstr/llama3.1-8b-spaetzle-v85
4
+ - cstr/llama3.1-8b-spaetzle-v86
5
+ - cstr/llama3.1-8b-spaetzle-v74
6
+ tags:
7
+ - merge
8
+ - mergekit
9
+ - lazymergekit
10
+ - cstr/llama3.1-8b-spaetzle-v85
11
+ - cstr/llama3.1-8b-spaetzle-v86
12
+ - cstr/llama3.1-8b-spaetzle-v74
13
+ license: llama3
14
+ language:
15
+ - en
16
+ - de
17
+ ---
18
+
19
+ # llama3.1-8b-spaetzle-v90
20
+
21
+ llama3.1-8b-spaetzle-v90 is a progressive merge of merges.
22
+
23
+ EQ-Bench v2_de: 69.93 (171/171).
24
+
25
+ The merge tree involves the following models:
26
+
27
+ - NousResearch/Hermes-3-Llama-3.1-8B
28
+ - Undi95/Meta-Llama-3.1-8B-Claude
29
+ - Dampfinchen/Llama-3.1-8B-Ultra-Instruct
30
+ - VAGOsolutions/Llama-3.1-SauerkrautLM-8b-Instruct
31
+ - akjindal53244/Llama-3.1-Storm-8B
32
+ - nbeerbower/llama3.1-gutenberg-8B
33
+ - Undi95/Meta-Llama-3.1-8B-Claude
34
+ - DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1
35
+ - nbeerbower/llama-3-wissenschaft-8B-v2
36
+ - Azure99/blossom-v5-llama3-8b
37
+ - VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct
38
+ - princeton-nlp/Llama-3-Instruct-8B-SimPO
39
+ - Locutusque/llama-3-neural-chat-v1-8b
40
+ - Locutusque/Llama-3-Orca-1.0-8B
41
+ - DiscoResearch/Llama3_DiscoLM_German_8b_v0.1_experimental
42
+ - seedboxai/Llama-3-Kafka-8B-v0.2
43
+ - VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct
44
+ - nbeerbower/llama-3-wissenschaft-8B-v2
45
+ - mlabonne/Daredevil-8B-abliterated-dpomix
46
+
47
+ There have been a number of steps involved, among which, slep merging of only middle layers compensating for tokenizer / chat template differences. An illustration below.
48
+
49
+ ## 🧩 Configuration
50
+
51
+ The final merge for this was:
52
+
53
+ ```yaml
54
+ models:
55
+ - model: cstr/llama3.1-8b-spaetzle-v59
56
+ # no parameters necessary for base model
57
+ - model: cstr/llama3.1-8b-spaetzle-v85
58
+ parameters:
59
+ density: 0.65
60
+ weight: 0.3
61
+ - model: cstr/llama3.1-8b-spaetzle-v86
62
+ parameters:
63
+ density: 0.65
64
+ weight: 0.3
65
+ - model: cstr/llama3.1-8b-spaetzle-v74
66
+ parameters:
67
+ density: 0.65
68
+ weight: 0.3
69
+ merge_method: dare_ties
70
+ base_model: cstr/llama3.1-8b-spaetzle-v59
71
+ parameters:
72
+ int8_mask: true
73
+ dtype: bfloat16
74
+ random_seed: 0
75
+ tokenizer_source: base
76
+ ```
77
+
78
+ Among the previous steps:
79
+ ```yaml
80
+ models:
81
+ - model: NousResearch/Hermes-3-Llama-3.1-8B
82
+ merge_method: slerp
83
+ base_model: cstr/llama3.1-8b-spaetzle-v74
84
+ parameters:
85
+ t:
86
+ - value: [0, 0, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0, 0]
87
+ dtype: float16
88
+ ```
89
+
90
+ ## 💻 Usage
91
+
92
+ Use with llama3 chat template as common. The q4km quants here are from [cstr/llama3.1-8b-spaetzle-v90](https://huggingface.co/cstr/llama3.1-8b-spaetzle-v90).
93
+