RozGrov
/

NemoDori-v0.1-12B-MS

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

NemoDori-v0.1-12B-MS / README.md

RozGrov's picture

Update README.md (#3)

ddd63d2 verified 4 months ago

|

history blame contribute delete

2.79 kB

	---
	tags:
	- merge
	- mergekit
	- lazymergekit
	library_name: transformers
	pipeline_tag: text-generation
	---

	# NemoDori-v0.1-12B-MS

	NemoDori-v0.1-12B-MS is a MODEL STOCK merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing) (see below for merge configuration. All credits to them.)

	This is my 'first' merge model, just for testing purpose. I don't know what I'm doing, honestly...

	My experience using this in SillyTavern:
	- It advances the story slowly, responding to the last message quite nicely.
	- Creativity is good, sometimes surprising me with a similar response that I'd like to get.
	- It may skip time when the last message includes word(s) that resemble a promise (or literally time).
	- Sometimes it responds with a long response, but it's kind of adapted to the overall roleplay, i think...


	## Prompt and Preset

	ChatML works best so far. Llama3 and Mistral prompts work, but sometimes they speak for you. (ChatML may also speak for you, but not that often - simply re-generate.)

	I use context and instruct from [here](https://huggingface.co/Virt-io/SillyTavern-Presets/tree/main/Prompts/ChatML/v1.9) (Credits to [Virt-io](https://huggingface.co/Virt-io).)

	[This](https://pastebin.com/4jSq8V4N) is the preset I use for SillyTavern, it should be good enough.
	Tweak to your heart's content:
	- temp can go higher (i stopped at 2),
	- skip special tokens may or may not be needed. If it responds with "assistant" or "user" at the end, try disabling the checkbox. (i did get it in my first couple of tries, but now, no more. not sure why...)
	- context length so far still coherence at 28k tokens, based on my own testing.
	- everything else is... just fine, as long as you're not forcing it.


	## 🧩 Configuration

	```yaml
	models:
	- model: Sao10K/MN-12B-Lyra-v1
	- model: Fizzarolli/MN-12b-Rosier-v1
	- model: MarinaraSpaghetti/Nemomix-v4.0-12B
	- model: aetherwiing/MN-12B-Starcannon-v2
	merge_method: model_stock
	base_model: aetherwiing/MN-12B-Starcannon-v2
	dtype: bfloat16
	```

	## 💻 Usage

	```python
	!pip install -qU transformers accelerate

	from transformers import AutoTokenizer
	import transformers
	import torch

	model = "RozGrov/NemoDori-v0.1-12B-MS"
	messages = [{"role": "user", "content": "What is a large language model?"}]

	tokenizer = AutoTokenizer.from_pretrained(model)
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	print(outputs[0]["generated_text"])
	```