This is an experimental 2x8B moe with random gates, using the following 2 models
Hermes-2-Theta-l3-8B by Nous Research https://huggingface.co/NousResearch/Hermes-2-Theta-Llama-3-8B
llama-3-cat-8B-instruct-V1 by TheSkullery https://huggingface.co/TheSkullery/llama-3-cat-8b-instruct-v1
Important
Make sure to add </s>
a stop sequence as it uses llama-3-cat-8B-instruct-V1 as the base model.
Update:
Due to request i decided to add the rest of the quants. Enjoy
Mergekit recipe of the model if too lazy to check the files:
base_model: TheSkullery/llama-3-cat-8b-instruct-v1
gate_mode: random
dtype: bfloat16
experts_per_token: 2
experts:
- source_model: TheSkullery/llama-3-cat-8b-instruct-v1
positive_prompts:
- " "
- source_model: NousResearch/Hermes-2-Theta-Llama-3-8B
positive_prompts:
- " "
- Downloads last month
- 41