Crystalcareai
/

LlaMoE-Medium

Text Generation

Model card Files Files and versions Community

LlaMoE-Medium / README.md

Crystalcareai's picture

Update README.md

adb187c verified 8 months ago

|

history blame contribute delete

670 Bytes

	<p align="center"> <img src="https://huggingface.co/Crystalcareai/LlaMoE-Medium/resolve/main/resources/ddb-nye2T3C3vZwJJm1l6A.png" width="auto" title="LlaMoE-Medium model image"> </p>

	This is a 4x8b Llama Mixture of Experts (MoE) model. It was trained on OpenHermes Resort from the Dolphin-2.9 dataset.

	The model is a combination of 4 Llama fine-tunes, using DeepSpeed-MoE's architecture. All experts are active for every token.

	This is a VERY good model, somewhere in between 8B and Llama 70B in capability. Enjoy!

	Thank you to:

	CrusoeEnergy for sponsoring the compute for this project
	My collaborators Eric Hartford and Fernando (has too many names) Neto