Update README.md
Browse files
README.md
CHANGED
@@ -152,6 +152,8 @@ and merged in
|
|
152 |
|
153 |
All of the layers were partitioned in to 9 random bins. Alternating models were slerped at [0...1], and [1...0] gradients; except attention, which was slerped at 0.03.
|
154 |
|
|
|
|
|
155 |
### Other
|
156 |
|
157 |
Includes fast tokenizer.
|
|
|
152 |
|
153 |
All of the layers were partitioned in to 9 random bins. Alternating models were slerped at [0...1], and [1...0] gradients; except attention, which was slerped at 0.03.
|
154 |
|
155 |
+
This means that the model is still predominantly ordered around base mistral - including half of the input and output layers, and 28% of attention.
|
156 |
+
|
157 |
### Other
|
158 |
|
159 |
Includes fast tokenizer.
|