paulilioaica/MixtureOfPhi3 · Modification to mergekit

Hi.
Broadly put you need to:

Find out the model's architecture format and layer names and add them into the architecture.py file so you can iterate over the layers
Define a way to store the multiple linear layers (experts) in the model

Once you have a merged model, you need to help Huggingface to know how to use it

You need to add some special config params such as num_experts and num_experts_per_token (active experts) and read them in the init
Modify the Phi3 PyTorch implementation to accept the experts (defined in the format chosen at 2.)

For the Phi3, if you use the same format (just run the notebook I linked) it should run automatically since I am copying some configs and files over.
Otherwise, you are looking at modifying modeling_phi.py, configuration_phi3 and config.json where you add your extra params.
Hope this gives you a good idea.