Modification to mergekit

#1
by vhug - opened

Hi @paulilioaica , can you please share what needs to be changed in the mergekit code to make it work for phi2 and phi3? I saw you have successfully also merged phi-2s using mergekit-moe.

In my experiments, after running mergekit-moe with my modifications to accept phi2 layers, i get a merged model and now in order to use this merged model, what changes need to be done to config.json, modeling_phi.py and configuration_phi.py?

Hi.
Broadly put you need to:

  1. Find out the model's architecture format and layer names and add them into the architecture.py file so you can iterate over the layers
  2. Define a way to store the multiple linear layers (experts) in the model

Once you have a merged model, you need to help Huggingface to know how to use it

  1. You need to add some special config params such as num_experts and num_experts_per_token (active experts) and read them in the init
  2. Modify the Phi3 PyTorch implementation to accept the experts (defined in the format chosen at 2.)

For the Phi3, if you use the same format (just run the notebook I linked) it should run automatically since I am copying some configs and files over.
Otherwise, you are looking at modifying modeling_phi.py, configuration_phi3 and config.json where you add your extra params.
Hope this gives you a good idea.

Sign up or log in to comment