chargoddard
commited on
Commit
•
94e804a
1
Parent(s):
8363335
Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,6 @@ QLoRA tuned from [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/
|
|
20 |
|
21 |
My main reason for training this model was to investigate using an altered router balancing loss combined with the z-loss introduced in [ST-MoE: Designing Stable and Transferable Sparse Expert Models](https://arxiv.org/abs/2202.08906). The result is pretty decent, I think! It does a good job of respecting character information in system prompts and performed adequately on a few simple coding tasks.
|
22 |
|
23 |
-
To train this I used a custom branch of Transformers that adds z-loss and reimplements the router balancing loss based on the version in [MegaBlocks](https://github.com/stanford-futuredata/megablocks).
|
24 |
|
25 |
-
Uses my favorite non-ChatML token-economic chat prompt format. Messages should be prefixed with `" ***System:"`, `" ***Query:"`, or `" ***Response:"` for system, user, and model messages respectively. No newlines are necessary but the space before the triple asterisk is mandatory.
|
|
|
20 |
|
21 |
My main reason for training this model was to investigate using an altered router balancing loss combined with the z-loss introduced in [ST-MoE: Designing Stable and Transferable Sparse Expert Models](https://arxiv.org/abs/2202.08906). The result is pretty decent, I think! It does a good job of respecting character information in system prompts and performed adequately on a few simple coding tasks.
|
22 |
|
23 |
+
To train this I used a custom branch of Transformers that adds z-loss and reimplements the router balancing loss based on the version in [MegaBlocks](https://github.com/stanford-futuredata/megablocks). The config used with my custom hacked-up branch of axolotl is available [here](https://huggingface.co/chargoddard/MixtralRPChat-ZLoss/blob/main/axolotl_config.yml).
|
24 |
|
25 |
+
Uses my favorite non-ChatML token-economic chat prompt format. Messages should be prefixed with `" ***System:"`, `" ***Query:"`, or `" ***Response:"` for system, user, and model messages respectively. No newlines are necessary but the space before the triple asterisk is mandatory.
|