README.md · paulilioaica/MixtureOfPhi3 at main

MixtureOfPhi3 / README.md

paulilioaica

Update README.md

4fd45dd verified 10 months ago

preview code

raw

history blame contribute delete

2.56 kB

	---
	license: apache-2.0
	tags:
	- moe
	- frankenmoe
	- merge
	- mergekit
	- lazymergekit
	- phi3_mergekit
	- microsoft/Phi-3-mini-128k-instruct
	base_model:
	- microsoft/Phi-3-mini-128k-instruct
	- microsoft/Phi-3-mini-128k-instruct
	---


	# MixtureOfPhi3

	<p align="center">
	<img src="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F11201acc-4089-416d-921b-cbd71fbf8ddb_1024x1024.jpeg" width="300" class="center"/>
	</p>


	MixtureOfPhi3 is a Mixure of Experts (MoE) made with the following models using mergekit:
	* [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)
	* [Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)

	This has been created using [LazyMergekit-Phi3](https://colab.research.google.com/drive/1Upb8JOAS3-K-iemblew34p9h1H6wtCeU?usp=sharing)

	This run is only for development purposes, since merging 2 identical models does not bring any performance benefits, but once specialized finetunes of Phi3 models will be available, it will be a starting point for creating MoE from them.

	## ©️ Credits
	* [mlabonne's phixtral](https://huggingface.co/mlabonne/phixtral-4x2_8) where I adapted the inference code to Phi3's architecture.
	* [mergekit](https://github.com/cg123/mergekit) code which I tweaked to merge Phi3s


	These have been merged using `cheap_embed` where each model is assigned a vector representation of words - such as experts for scientific work, reasoning, math etc.

	Try your own in the link above !


	## 🧩 Configuration

	```yaml
	base_model: microsoft/Phi-3-mini-128k-instruct
	gate_mode: cheap_embed
	dtype: float16
	experts:
	- source_model: microsoft/Phi-3-mini-128k-instruct
	positive_prompts: ["research, logic, math, science"]
	- source_model: microsoft/Phi-3-mini-128k-instruct
	positive_prompts: ["creative, art"]
	```

	## 💻 Usage

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer


	model = "paulilioaica/MixtureOfPhi3"

	tokenizer = AutoTokenizer.from_pretrained(model)
	model = AutoModelForCausalLM.from_pretrained(
	model,
	trust_remote_code=True,
	)

	prompt="How many continents are there?"
	input = f"<\|system\|>\nYou are a helpful AI assistant.<\|end\|>\n<\|user\|>{prompt}\n<\|assistant\|>"
	tokenized_input = tokenizer.encode(input, return_tensors="pt")

	outputs = model.generate(tokenized_input, max_new_tokens=128, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	print(tokenizer.decode(outputs[0]))
	```