dranger003
commited on
Commit
•
01aa3c8
1
Parent(s):
d333185
Update README.md
Browse files
README.md
CHANGED
@@ -19,3 +19,9 @@ base_model: databricks/dbrx-instruct
|
|
19 |
| Layers | Context | Template |
|
20 |
| --- | --- | --- |
|
21 |
| <pre> </pre> | <pre>32768</pre> | <pre>\<\|im_start\|\> system<br>{system} \<\|im_end\|\><br>\<\|im_start\|\> user<br>{prompt} \<\|im_end\|\><br>\<\|im_start\|\> assistant<br> </pre> |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
| Layers | Context | Template |
|
20 |
| --- | --- | --- |
|
21 |
| <pre> </pre> | <pre>32768</pre> | <pre>\<\|im_start\|\> system<br>{system} \<\|im_end\|\><br>\<\|im_start\|\> user<br>{prompt} \<\|im_end\|\><br>\<\|im_start\|\> assistant<br> </pre> |
|
22 |
+
|
23 |
+
* 16x12B MoE
|
24 |
+
* 16 experts (12B params per single expert; top_k=4 routing)
|
25 |
+
* 36B active params (132B total params)
|
26 |
+
* Trained on 12T tokens
|
27 |
+
* 32k sequence length training
|