README.md · serpdotai/sparsetral-16x7B-v2-SPIN_iter0 at 3d929871f3dfa3cfcfae7ebb29767c2bb6ea3433

metadata

license: apache-2.0
datasets:
  - teknium/OpenHermes-2.5
  - jondurbin/truthy-dpo-v0.1
  - jondurbin/gutenberg-dpo-v0.1
  - argilla/dpo-mix-7k
language:
  - en

This model is sparsetral-16x7B-v2 further tuned utilizing SPIN on OpenHermes-2.5 mixed with traditional DPO samples. This is iteration_0, plan to keep making iterations until improvements stop.

Training

8x A6000s
Base model is sparsetral-16x7B-v2
Forked version of unsloth for efficient training
Sequence Length: 4096
Effective batch size: 64
Learning Rate: 5e-7 with linear decay (0.1 warmup ratio)
Epochs: 2
50k samples (~15k traditional dpo samples, rest SPIN)

QLoRA:

256 r and 256 alpha

target_modules=[
    "q_proj",
    "k_proj",
    "v_proj",
    "o_proj",
    "gate_proj",
    "up_proj",
    "down_proj",
    "adapter_down",
    "adapter_up",
]

Prompt Format

<|im_start|>system\n{message}<|im_end|>\n<|im_start|>user\n{message}<|im_end|>\n<|im_start|>assistant\n

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("serpdotai/sparsetral-16x7B-v2-SPIN_iter0", device_map="auto", trust_remote_code=True).eval()

system_str = "<|im_start|>system\n{message}<|im_end|>\n"
user_str = "<|im_start|>user\n{message}<|im_end|>\n"
assistant_str = "<|im_start|>assistant\n{message}<|im_end|>\n"

def construct_prompt(messages):
    prompt = ""
    for message in messages:
        if message["from"] in ["human", "user"]:
            prompt += user_str.format(
                message=message["value"]
            )
        elif message["from"] in ["gpt", "assistant"]:
            prompt += assistant_str.format(
                message=message["value"]
            )
        elif message["from"] in ["system", "instruction"]:
            prompt += system_str.format(
                message=message["value"]
            )
        else:
            raise ValueError(
                f"Unknown message type: {message['from']}"
            )
    return prompt + "<|im_start|>assistant\n"

system = "You are a helpful assistant who will help the user to the best of their ability. If you don't know something, say \"I don't know\""
user = "Are you sentient?"

messages = [
    {"from": "system", "value": system},
    {"from": "user", "value": user},
]

prompt = construct_prompt(messages)
inputs = tokenizer(prompt, return_tensors="pt")
inputs = inputs.to(model.device)
pred = model.generate(**inputs, max_length=4096, do_sample=True, top_k=50, top_p=0.99, temperature=0.9, num_return_sequences=1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Other Information

Paper reference: Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Original Paper repo

Forked repo with mistral support (sparsetral)

If you are interested in faster inferencing, check out our fork of vLLM that adds sparsetral support