|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- teknium/OpenHermes-2.5 |
|
- jondurbin/truthy-dpo-v0.1 |
|
- jondurbin/gutenberg-dpo-v0.1 |
|
- argilla/dpo-mix-7k |
|
language: |
|
- en |
|
--- |
|
This model is [sparsetral-16x7B-v2](https://huggingface.co/serpdotai/sparsetral-16x7B-v2) further tuned utilizing [SPIN](https://arxiv.org/abs/2401.01335) on [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) mixed with traditional DPO samples. This is iteration_1, temporarily pausing further training runs in favor of utilizing [DoRA](https://arxiv.org/pdf/2402.09353.pdf) over [LoRA](https://arxiv.org/abs/2106.09685). May also start from the beginning with v3 for proper chat token support, also debating adding function tokens + function calling. If you have any tasks that Sparsetral has been weak at, feel free to send us some prompts/chats + desired completions and we will see about making sure your task is supported! |
|
|
|
![](https://i.imgflip.com/8g9jr4.jpg) |
|
|
|
Kuru~ Kuru~ |
|
![Kuru~ Kuru~](https://github.com/duiqt/herta_kuru/raw/main/static/img/hertaa_github.gif) |
|
|
|
## Training |
|
- 8x A6000s |
|
- Base model is [sparsetral-16x7B-v2-SPIN_iter0](https://huggingface.co/serpdotai/sparsetral-16x7B-v2-SPIN_iter0) |
|
- [Forked version of unsloth](https://github.com/serp-ai/unsloth) for efficient training |
|
- Sequence Length: 4096 |
|
- Effective batch size: 64 |
|
- Learning Rate: 5e-7 with linear decay (0.1 warmup ratio) |
|
- Epochs: 2 |
|
- 100k samples (50K new SPIN + 50K from iter_0) |
|
- QLoRA: |
|
- 256 r and 256 alpha |
|
- ```python |
|
target_modules=[ |
|
"q_proj", |
|
"k_proj", |
|
"v_proj", |
|
"o_proj", |
|
"gate_proj", |
|
"up_proj", |
|
"down_proj", |
|
"adapter_down", |
|
"adapter_up", |
|
] |
|
``` |
|
|
|
## Prompt Format |
|
``` |
|
<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n |
|
``` |
|
|
|
## Usage |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval() |
|
|
|
system_str = "<|im_start|>system\n{message}<|im_end|>\n" |
|
user_str = "<|im_start|>user\n{message}<|im_end|>\n" |
|
assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n" |
|
|
|
def construct_prompt(messages): |
|
prompt = "" |
|
for message in messages: |
|
if message["from"] in ["human", "user"]: |
|
prompt += user_str.format( |
|
message=message["value"] |
|
) |
|
elif message["from"] in ["gpt", "assistant"]: |
|
prompt += assistant_str.format( |
|
message=message["value"] |
|
) |
|
elif message["from"] in ["system", "instruction"]: |
|
prompt += system_str.format( |
|
message=message["value"] |
|
) |
|
else: |
|
raise ValueError( |
|
f"Unknown message type: {message['from']}" |
|
) |
|
return prompt + "<|im_start|>assistant\n" |
|
|
|
system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\"" |
|
user = "Are you sentient?" |
|
|
|
messages = [ |
|
{"from": "system", "value": system}, |
|
{"from": "user", "value": user}, |
|
] |
|
|
|
prompt = construct_prompt(messages) |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
inputs = inputs.to(model.device) |
|
pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1) |
|
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True)) |
|
``` |
|
|
|
## Other Information |
|
Paper reference: [Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks](https://arxiv.org/abs/2401.02731) |
|
|
|
[Original Paper repo](https://github.com/wuhy68/Parameter-Efficient-MoE) |
|
|
|
[Forked repo with mistral support (sparsetral)](https://github.com/serp-ai/Parameter-Efficient-MoE) |
|
|
|
If you are interested in faster inferencing, check out our [fork of vLLM](https://github.com/serp-ai/vllm) that adds sparsetral support |