Do you need fine-tune after merging?
#5
by
tanganke
- opened
Great model. I wonder know how did you get the weights for the MoE routers?
don't need fine-tune, because only two experts
Thanks for your time!
I am also trying to construct a MoE model like this using mergekit.
The configuration needs to specify a base model and positive prompts. How did you set these?
base_model: ???
gate_mode: hidden
dtype: float32
experts:
- source_model: NurtureAI/neural-chat-7b-v3-16k # https://huggingface.co/NurtureAI/neural-chat-7b-v3-16k
positive_prompts:
- "???"
# (optional)
# negative_prompts:
# - "This is a prompt expert_model_1 should not be used for"
- source_model: mncai/mistral-7b-dpo-v6 # https://huggingface.co/mncai/mistral-7b-dpo-v6
positive_prompts:
- "???"
You have to try every candidate and then locally test the model performance by https://github.com/EleutherAI/lm-evaluation-harness.
I use hellaswag metric only and some manual testing.
You will find the best setting sooner or later.
Good luck!
tanganke
changed discussion status to
closed