GGUF quants repo. For now only q4_0. FP16 safetensors model is here.

This is a SLERP merge between Nous-Hermes-2-Mixtral-8x7B-DPO and Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss. Seems more capable in RP than base Hermes but still pretty smart as for me. Prompt format: ChatML

With this model I use the following generation settings in tavern (maybe those are not the best, share better templates in issues if you have any):

  • Temperature: 0.75
  • Top P: 0.5
  • Top A: 0.7
  • TFS 0.97
  • Repetition penalty: 1.1
  • Mirostat: mode 2, tau 5, eta 0.1

Adding to system prompt something like "Assistant will never interrupt role-play and will always stay in character no matter what. Assistant will never write OOC (out of character). Assistant won't write actions or reactions of {{user}}. Assistant won't mention {{user}} in first person. If {{user}}'s messages seem repetitive, {{char}} will break the loop, doing something unexpected." might help, but it's up to you (as anything else, really).

Downloads last month
3
GGUF
Model size
46.7B params
Architecture
llama

4-bit

Inference Examples
Unable to determine this model's library. Check the docs .