FluffyKaeloky commited on
Commit
5462be9
1 Parent(s): 4d02594

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -45
README.md CHANGED
@@ -1,45 +1,60 @@
1
- ---
2
- base_model: []
3
- library_name: transformers
4
- tags:
5
- - mergekit
6
- - merge
7
-
8
- ---
9
- # LuminumMistral-123B
10
-
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
-
13
- ## Merge Details
14
- ### Merge Method
15
-
16
- This model was merged using the della_linear merge method using mistralaiMistral-Large-Instruct-2407 as a base.
17
-
18
- ### Models Merged
19
-
20
- The following models were included in the merge:
21
- * NeverSleepLumimaid-v0.2-123B
22
- * anthracite-orgmagnum-v2-123b
23
-
24
- ### Configuration
25
-
26
- The following YAML configuration was used to produce this model:
27
-
28
- ```yaml
29
- models:
30
- - model: anthracite-orgmagnum-v2-123b
31
- parameters:
32
- weight: 0.19
33
- density: 0.5
34
- - model: NeverSleepLumimaid-v0.2-123B
35
- parameters:
36
- weight: 0.34
37
- density: 0.8
38
- merge_method: della_linear
39
- base_model: mistralaiMistral-Large-Instruct-2407
40
- parameters:
41
- epsilon: 0.05
42
- lambda: 1
43
- int8_mask: true
44
- dtype: bfloat16
45
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: []
3
+ library_name: transformers
4
+ tags:
5
+ - mergekit
6
+ - merge
7
+
8
+ ---
9
+ # LuminumMistral-123B
10
+
11
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
+
13
+ ## Merge Details
14
+
15
+ I present Luminum.
16
+
17
+ This is a merge using Mistral Large as a base, and including Lumimaid-v0.2-123B and Magnum-v2-123B.
18
+ I felt like Magnum was rambling too much, and Lumimaid lost slightly too much brain power, so I used Mistral Large base, but it was lacking some moist.
19
+
20
+ On a whim, I decided to merge both Lumimaid and Magnum on top of Mistral large, and while I wasn't expecting much, I've been very surprised with the results.
21
+
22
+ I've tested this model quite extensively at and above 32k with great success. It should in theory allow for the full 128k context, albeit I've only went to 40-50k max.
23
+ It's become my new daily driver.
24
+
25
+
26
+ I'll update the model card and add artwork tomorrow, am tired.
27
+
28
+
29
+ ### Merge Method
30
+
31
+ This model was merged using the della_linear merge method using mistralaiMistral-Large-Instruct-2407 as a base.
32
+
33
+ ### Models Merged
34
+
35
+ The following models were included in the merge:
36
+ * NeverSleepLumimaid-v0.2-123B
37
+ * anthracite-orgmagnum-v2-123b
38
+
39
+ ### Configuration
40
+
41
+ The following YAML configuration was used to produce this model:
42
+
43
+ ```yaml
44
+ models:
45
+ - model: anthracite-orgmagnum-v2-123b
46
+ parameters:
47
+ weight: 0.19
48
+ density: 0.5
49
+ - model: NeverSleepLumimaid-v0.2-123B
50
+ parameters:
51
+ weight: 0.34
52
+ density: 0.8
53
+ merge_method: della_linear
54
+ base_model: mistralaiMistral-Large-Instruct-2407
55
+ parameters:
56
+ epsilon: 0.05
57
+ lambda: 1
58
+ int8_mask: true
59
+ dtype: bfloat16
60
+ ```