serpdotai
/

sparsetral-16x7B-v2-SPIN_iter0

Text Generation

Inference Endpoints

Model card Files Files and versions Community

sparsetral-16x7B-v2-SPIN_iter0 / README.md

francislabounty's picture

francislabounty

Update README.md

05ea159 verified 5 months ago

|

raw history blame contribute delete

No virus

3.65 kB

	---
	license: apache-2.0
	datasets:
	- teknium/OpenHermes-2.5
	- jondurbin/truthy-dpo-v0.1
	- jondurbin/gutenberg-dpo-v0.1
	- argilla/dpo-mix-7k
	language:
	- en
	---
	This model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) further tuned utilizing [SPIN](https://arxiv.org/abs/2401.01335) on [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) mixed with traditional DPO samples. This is iteration_0, plan to keep making iterations until improvements stop.

	Kuru~ Kuru~
	![Kuru~ Kuru~](https://github.com/duiqt/herta_kuru/raw/main/static/img/hertaa_github.gif)

	## Training
	- 8x A6000s
	- Base model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2)
	- [Forked version of unsloth](https://github.com/serp-ai/unsloth) for efficient training
	- Sequence Length: 4096
	- Effective batch size: 64
	- Learning Rate: 5e-7 with linear decay (0.1 warmup ratio)
	- Epochs: 2
	- 50k samples (~15k traditional dpo samples, rest SPIN)
	- QLoRA:
	- 256 r and 256 alpha
	- ```python
	target_modules=[
	"q_proj",
	"k_proj",
	"v_proj",
	"o_proj",
	"gate_proj",
	"up_proj",
	"down_proj",
	"adapter_down",
	"adapter_up",
	]
	```

	## Prompt Format
	```
	<\|im_start\|>system\n{message}<\|im_end\|>\n<\|im_start\|>user\n{message}<\|im_end\|>\n<\|im_start\|>assistant\n
	```

	## Usage
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval()

	system_str = "<\|im_start\|>system\n{message}<\|im_end\|>\n"
	user_str = "<\|im_start\|>user\n{message}<\|im_end\|>\n"
	assistant_str = "<\|im_start\|>assistant\n{message}<\|im_end\|>\n"

	def construct_prompt(messages):
	prompt = ""
	for message in messages:
	if message["from"] in ["human", "user"]:
	prompt += user_str.format(
	message=message["value"]
	)
	elif message["from"] in ["gpt", "assistant"]:
	prompt += assistant_str.format(
	message=message["value"]
	)
	elif message["from"] in ["system", "instruction"]:
	prompt += system_str.format(
	message=message["value"]
	)
	else:
	raise ValueError(
	f"Unknown message type: {message['from']}"
	)
	return prompt + "<\|im_start\|>assistant\n"

	system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
	user = "Are you sentient?"

	messages = [
	{"from": "system", "value": system},
	{"from": "user", "value": user},
	]

	prompt = construct_prompt(messages)
	inputs = tokenizer(prompt, return_tensors="pt")
	inputs = inputs.to(model.device)
	pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
	print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
	```

	## Other Information
	Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731)

	[Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE)

	[Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE)

	If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support