π₯ MoE-Mixtral-7B-8Expert
mixtral-8x7b is a Mixture-of-Expert (MoE) model. LLaMA2-Accessory has supported its inference and finetuning.
π Features
With LLaMA2-Accessory, mixtral-8x7b enjoys the following features:
- Distributed MoE (namely instantiating experts on multiple processes/gpus)
- Load Balancing Loss
- Tensor Parallel and FSDP for efficiently training
- Distributed and/or quantized inference
π₯ Online Demo
We host a web demo π»here, which shows a mixtral-8x7b model finetuned on evol-codealpaca-v1 and ultrachat_200k, with LoRA and Bias tuning.
π‘ Tutorial
A detailed tutorial is available at our document