dranger003 commited on
Commit
01aa3c8
1 Parent(s): d333185

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -19,3 +19,9 @@ base_model: databricks/dbrx-instruct
19
  | Layers | Context | Template |
20
  | --- | --- | --- |
21
  | <pre> </pre> | <pre>32768</pre> | <pre>\<\|im_start\|\> system<br>{system} \<\|im_end\|\><br>\<\|im_start\|\> user<br>{prompt} \<\|im_end\|\><br>\<\|im_start\|\> assistant<br> </pre> |
 
 
 
 
 
 
 
19
  | Layers | Context | Template |
20
  | --- | --- | --- |
21
  | <pre> </pre> | <pre>32768</pre> | <pre>\<\|im_start\|\> system<br>{system} \<\|im_end\|\><br>\<\|im_start\|\> user<br>{prompt} \<\|im_end\|\><br>\<\|im_start\|\> assistant<br> </pre> |
22
+
23
+ * 16x12B MoE
24
+ * 16 experts (12B params per single expert; top_k=4 routing)
25
+ * 36B active params (132B total params)
26
+ * Trained on 12T tokens
27
+ * 32k sequence length training