metadata

tags:
  - merge
  - mergekit
  - lazymergekit
library_name: transformers
pipeline_tag: text-generation

NemoDori-v0.1-12B-MS

NemoDori-v0.1-12B-MS is a MODEL STOCK merge of the following models using LazyMergekit (see below for merge configuration. All credits to them.)

This is my 'first' merge model, just for testing purpose. I don't know what I'm doing, honestly...

My experience using this in SillyTavern:

It advances the story slowly, responding to the last message quite nicely.
Creativity is good, sometimes surprising me with a similar response that I'd like to get.
It may skip time when the last message includes word(s) that resemble a promise (or literally time).
Sometimes it responds with a long response, but it's kind of adapted to the overall roleplay, i think...

Prompt and Preset

ChatML works best so far. Llama3 and Mistral prompts work, but sometimes they speak for you. (ChatML may also speak for you, but not that often - simply re-generate.)

I use context and instruct from here (Credits to Virt-io.)

This is the preset I use for SillyTavern, it should be good enough. Tweak to your heart's content:

temp can go higher (i stopped at 2),
skip special tokens may or may not be needed. If it responds with "assistant" or "user" at the end, try disabling the checkbox. (i did get it in my first couple of tries, but now, no more. not sure why...)
context length so far still coherence at 28k tokens, based on my own testing.
everything else is... just fine, as long as you're not forcing it.

🧩 Configuration

models:
  - model: Sao10K/MN-12B-Lyra-v1
  - model: Fizzarolli/MN-12b-Rosier-v1
  - model: MarinaraSpaghetti/Nemomix-v4.0-12B
  - model: aetherwiing/MN-12B-Starcannon-v2
merge_method: model_stock
base_model: aetherwiing/MN-12B-Starcannon-v2
dtype: bfloat16

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "RozGrov/NemoDori-v0.1-12B-MS"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])