Text Generation
Transformers
Safetensors
English
mixtral
conversational
Inference Endpoints
text-generation-inference
Edit model card

Built with Axolotl

QLoRA tuned from mistralai/Mixtral-8x7B-v0.1.

My main reason for training this model was to investigate using an altered router balancing loss combined with the z-loss introduced in ST-MoE: Designing Stable and Transferable Sparse Expert Models. The result is pretty decent, I think! It does a good job of respecting character information in system prompts and performed adequately on a few simple coding tasks.

To train this I used a custom branch of Transformers that adds z-loss and reimplements the router balancing loss based on the version in MegaBlocks. The config used with my custom hacked-up branch of axolotl is available here.

Uses my favorite non-ChatML token-economic chat prompt format. Messages should be prefixed with " ***System:", " ***Query:", or " ***Response:" for system, user, and model messages respectively. No newlines are necessary but the space before the triple asterisk is mandatory.

Downloads last month
123
Safetensors
Model size
46.7B params
Tensor type
BF16
·
Inference API
Input a message to start chatting with chargoddard/MixtralRPChat-ZLoss.
Model is too large to load in Inference API (serverless). To try the model, launch it on Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train chargoddard/MixtralRPChat-ZLoss