Modification to mergekit
Hi @paulilioaica , can you please share what needs to be changed in the mergekit code to make it work for phi2 and phi3? I saw you have successfully also merged phi-2s using mergekit-moe.
In my experiments, after running mergekit-moe with my modifications to accept phi2 layers, i get a merged model and now in order to use this merged model, what changes need to be done to config.json, modeling_phi.py and configuration_phi.py?
Hi.
Broadly put you need to:
- Find out the model's architecture format and layer names and add them into the architecture.py file so you can iterate over the layers
- Define a way to store the multiple linear layers (experts) in the model
Once you have a merged model, you need to help Huggingface to know how to use it
- You need to add some special config params such as num_experts and num_experts_per_token (active experts) and read them in the init
- Modify the Phi3 PyTorch implementation to accept the experts (defined in the format chosen at 2.)
For the Phi3, if you use the same format (just run the notebook I linked) it should run automatically since I am copying some configs and files over.
Otherwise, you are looking at modifying modeling_phi.py, configuration_phi3 and config.json where you add your extra params.
Hope this gives you a good idea.