|
--- |
|
license: apache-2.0 |
|
tags: |
|
- moe |
|
- frankenmoe |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- phi3_mergekit |
|
- microsoft/Phi-3-mini-128k-instruct |
|
base_model: |
|
- microsoft/Phi-3-mini-128k-instruct |
|
- microsoft/Phi-3-mini-128k-instruct |
|
--- |
|
|
|
|
|
# MixtureOfPhi3 |
|
|
|
<p align="center"> |
|
<img src="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11201acc-4089-416d-921b-cbd71fbf8ddb_1024x1024.jpeg" width="300" class="center"/> |
|
</p> |
|
|
|
|
|
**MixtureOfPhi3** is a Mixure of Experts (MoE) made with the following models using mergekit: |
|
* [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) |
|
* [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) |
|
|
|
This has been created using [LazyMergekit-Phi3](https://colab.research.google.com/drive/1Upb8JOAS3-K-iemblew34p9h1H6wtCeU?usp=sharing) |
|
|
|
This run is only for development purposes, since merging 2 identical models does not bring any performance benefits, but once specialized finetunes of Phi3 models will be available, it will be a starting point for creating MoE from them. |
|
|
|
## ©️ Credits |
|
* [mlabonne's phixtral](https://huggingface.co/mlabonne/phixtral-4x2_8) where I adapted the inference code to Phi3's architecture. |
|
* [mergekit](https://github.com/cg123/mergekit) code which I tweaked to merge Phi3s |
|
|
|
|
|
These have been merged using `cheap_embed` where each model is assigned a vector representation of words - such as experts for scientific work, reasoning, math etc. |
|
|
|
Try your own in the link above ! |
|
|
|
|
|
## 🧩 Configuration |
|
|
|
```yaml |
|
base_model: microsoft/Phi-3-mini-128k-instruct |
|
gate_mode: cheap_embed |
|
dtype: float16 |
|
experts: |
|
- source_model: microsoft/Phi-3-mini-128k-instruct |
|
positive_prompts: ["research, logic, math, science"] |
|
- source_model: microsoft/Phi-3-mini-128k-instruct |
|
positive_prompts: ["creative, art"] |
|
``` |
|
|
|
## 💻 Usage |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = "paulilioaica/MixtureOfPhi3" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model, |
|
trust_remote_code=True, |
|
) |
|
|
|
prompt="How many continents are there?" |
|
input = f"<|system|>\nYou are a helpful AI assistant.<|end|>\n<|user|>{prompt}\n<|assistant|>" |
|
tokenized_input = tokenizer.encode(input, return_tensors="pt") |
|
|
|
outputs = model.generate(tokenized_input, max_new_tokens=128, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |