thomasgauthier commited on
Commit
cf7b767
1 Parent(s): b29249f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -47
README.md CHANGED
@@ -1,62 +1,54 @@
1
  ---
2
- base_model: []
3
- library_name: transformers
4
  tags:
5
- - mergekit
6
- - merge
7
-
 
8
  ---
9
- # output-model-directory
10
-
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
 
13
- ## Merge Details
14
- ### Merge Method
15
 
16
- This model was merged using the [linear](https://arxiv.org/abs/2203.05482) merge method.
 
17
 
18
- ### Models Merged
 
 
19
 
20
- The following models were included in the merge:
21
- * /workspace/junk/singlestral5
22
- * /workspace/junk/singlestral6
23
- * /workspace/junk/singlestral1
24
- * /workspace/junk/singlestral3
25
- * /workspace/junk/singlestral4
26
- * /workspace/junk/singlestral7
27
- * /workspace/junk/singlestral0
28
- * /workspace/junk/singlestral2
29
 
30
- ### Configuration
 
 
 
 
31
 
32
- The following YAML configuration was used to produce this model:
33
 
34
  ```yaml
35
  models:
36
- - model: /workspace/junk/singlestral0
37
- parameters:
38
- weight: 0.125
39
- - model: /workspace/junk/singlestral1
40
- parameters:
41
- weight: 0.125
42
- - model: /workspace/junk/singlestral2
43
- parameters:
44
- weight: 0.125
45
- - model: /workspace/junk/singlestral3
46
- parameters:
47
- weight: 0.125
48
- - model: /workspace/junk/singlestral4
49
- parameters:
50
- weight: 0.125
51
- - model: /workspace/junk/singlestral5
52
- parameters:
53
- weight: 0.125
54
- - model: /workspace/junk/singlestral6
55
- parameters:
56
- weight: 0.125
57
- - model: /workspace/junk/singlestral7
58
- parameters:
59
- weight: 0.125
60
  merge_method: linear
61
  dtype: float16
62
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
 
3
  tags:
4
+ - mixtral
5
+ - dense
6
+ - mistral
7
+ - expert
8
  ---
 
 
 
9
 
10
+ # Unmixtraled 22B 8x linear merge
 
11
 
12
+ > [!WARNING]
13
+ > This model outputs gibberish as it was not trained under the dense configuration. Finetuning or merging is needed to make this model useful.
14
 
15
+ This is a 22B Mistral model recycling weights from [mistral-community/Mixtral-8x22B-v0.1](https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1).
16
+ The model was adapted from a Mixtral architecture to a dense Mistral architecture with the same number of layers, attention heads and hidden dimensions.
17
+ Embeddings, attention, layer norms and LM head weights were taken directly from the 8x22B model, MLP weights are a linear merge of experts 0 to 7 weights.
18
 
19
+ The following named weight correspondance was used:
 
 
 
 
 
 
 
 
20
 
21
+ | Mistral weight | Mixtral weight |
22
+ |----------------|------------------------------|
23
+ | `gate_proj` | `experts.{layer_num}.w1` |
24
+ | `down_proj` | `experts.{layer_num}.w2` |
25
+ | `up_proj` | `experts.{layer_num}.w3` |
26
 
27
+ This mergekit configuration was used to merge the experts:
28
 
29
  ```yaml
30
  models:
31
+ - model: thomasgauthier/Unmixtraled-22B-v0.1-expert-0
32
+ - model: thomasgauthier/Unmixtraled-22B-v0.1-expert-1
33
+ - model: thomasgauthier/Unmixtraled-22B-v0.1-expert-2
34
+ - model: thomasgauthier/Unmixtraled-22B-v0.1-expert-3
35
+ - model: thomasgauthier/Unmixtraled-22B-v0.1-expert-4
36
+ - model: thomasgauthier/Unmixtraled-22B-v0.1-expert-5
37
+ - model: thomasgauthier/Unmixtraled-22B-v0.1-expert-6
38
+ - model: thomasgauthier/Unmixtraled-22B-v0.1-expert-7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
  merge_method: linear
40
  dtype: float16
41
  ```
42
+
43
+ ## Unmixtraled models
44
+ | Expert | Source | Wikitext perplexity |
45
+ |--------|-----------------|---------------------|
46
+ | [Unmixtraled-22B-v0.1-expert-0](https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-0) | Mixtral 8x22B embed, attn, layernorm, lm_head + expert 0 MLPs | 696.6932983398438 |
47
+ | [Unmixtraled-22B-v0.1-expert-1](https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-1) | Mixtral 8x22B embed, attn, layernorm, lm_head + expert 1 MLPs | 6853.04248046875 |
48
+ | [Unmixtraled-22B-v0.1-expert-2](https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-2) | Mixtral 8x22B embed, attn, layernorm, lm_head + expert 2 MLPs | 4689.181640625 |
49
+ | [Unmixtraled-22B-v0.1-expert-3](https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-3) | Mixtral 8x22B embed, attn, layernorm, lm_head + expert 3 MLPs | 782.3755493164062 |
50
+ | [Unmixtraled-22B-v0.1-expert-4](https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-4) | Mixtral 8x22B embed, attn, layernorm, lm_head + expert 4 MLPs | 2844.943603515625 |
51
+ | [Unmixtraled-22B-v0.1-expert-5](https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-5) | Mixtral 8x22B embed, attn, layernorm, lm_head + expert 5 MLPs | 1099.32373046875 |
52
+ | [Unmixtraled-22B-v0.1-expert-6](https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-6) | Mixtral 8x22B embed, attn, layernorm, lm_head + expert 6 MLPs | 341.5309753417969 |
53
+ | [Unmixtraled-22B-v0.1-expert-7](https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-expert-7) | Mixtral 8x22B embed, attn, layernorm, lm_head + expert 7 MLPs | 2099.63818359375 |
54
+ | [**Unmixtraled-22B-v0.1-lerp**](https://huggingface.co/thomasgauthier/Unmixtraled-22B-v0.1-lerp) | **Mixtral 8x22B embed, attn, layernorm, lm_head + linear merge of expert 0-7 MLPs** | **1873.9874267578125** |