MarinaraSpaghetti/NemoMix-Unleashed-12B

MarinaraSpaghetti

Owner Aug 21, 2024

You know the drill.

MarinaraSpaghetti pinned discussion Aug 21, 2024

SaisExperiments

Aug 22, 2024

•

edited Aug 22, 2024

Model go brrr, best nemo I've tried yet. Best way to describe it is a nice balance.
Can't post logs, don't have any ones I'd want public x-x
But yes, I like it. I thought the ingredients were going to lean to far into story-esk models but I was wrong, it does really well in standard "text" *action* rp
No comments on high context ability, too GPU poor
Edit: I'd also like to add on that it feels quite different than reremix and remix-4.0. But it's definitely an upgrade.
Second edit: If you're like me and respond to the model with like 10 token responses, this model also does a better job than reremix/4.0 at leading the scenario forwards instead of sideways without much user initiative.
My longest response in my current chat is 14 tokens :3

MarinaraSpaghetti

Owner Aug 22, 2024

Feedback goes brr, thank you so much for posting your thoughts @SaisExperiments ! Really glad to read the model is serving you well. <3 I like it the most out of my merges so far, too. Gutenberg is an amazing model, and it's quite a shame I've slept on it for so long. I can also confirm the model works great on higher contexts! I am currently using it with 64k, and here's a little sample of how it writes on it.

Warning, gore-y description. uwu

SaisExperiments

Aug 22, 2024

Gutenberg is an amazing model, and it's quite a shame I've slept on it for so long.

Can't beat real human data for training when it comes to stories, no LLM can beat a human author.

I can also confirm the model works great on higher contexts! I am currently using it with 64k, and here's a little sample of how it writes on it.

That's more coherent than my model is in the first response, guess a Q4 model with 8bit cache doesn't help x.x

MarinaraSpaghetti

Owner Aug 22, 2024

•

edited Aug 22, 2024

That's more coherent than my model is in the first response, guess a Q4 model with 8bit cache doesn't help x.x

Oh, dear, I strongly advise against caching content on Nemo models! Someone over on Reddit had these issues where the models were straight up producing nonsense for them even on contexts as low as 12k, and as we discovered, it was due to the caching. Once it was disabled, the models were working great. For some strange reason, Nemo HATES context caching. I advise you to test it without it on, even on a smaller quant.

I also agree with the human data! It's a shame we have so few fine-tuners who work with human data only… The best model I ever used was Nous-Capybara-34B, which was trained solely on human data, and it showed. Hands down the best non-AI sounding model up to this date.

SaisExperiments

Aug 22, 2024

Oh, dear, I strongly advise against caching content on Nemo models!

That made an instant difference @_@ one of my characters no longer gets which family members are on which side of the family incorrect everytime, It's like so much better, i've been missing out x^x

I also agree with the human data! It's a shame we have so few fine-tuners who work with human data only… The best model I ever used was Nous-Capybara-34B, which was trained solely on human data, and it showed. Hands down the best non-AI sounding model up to this date.

I also found models with limarp data in them to be better than their counterparts.
Sucks that the data comes from some rather unsavoury sources so most people avoid it
(I wouldn't feel comfortable training my own model on it knowing where the data comes from either)

I wonder how good of a source archive.org would be for human based training data. I'd love to see a biology model trained entirely on 1800s-early1900s medical documents, it would be so unhinged :3
Tummy sore? Eat mercury :3
Hand hurt? Cut it off :3
Feel sad? Lobotomy :3

Fizzarolli

Aug 22, 2024

The best model I ever used was Nous-Capybara-34B, which was trained solely on human data,

wuh? its all gpt-4 synth data..

MarinaraSpaghetti

Owner Aug 22, 2024

That made an instant difference @_@ one of my characters no longer gets which family members are on which side of the family incorrect everytime, It's like so much better, i've been missing out x^x

Glad to read that it works now!!! That was also the same case for the person on Reddit. :)

I also found models with limarp data in them to be better than their counterparts.

Yea, the 34B models which were my go-to for a long time had a limaRP lora added to them.

I wonder how good of a source archive.org would be for human based training data. I'd love to see a biology model trained entirely on 1800s-early1900s medical documents, it would be so unhinged :3

That sounds so cursed, lmao.

wuh? its all gpt-4 synth data..

Oh? I thought they released their dataset some time ago and remember that it was human-data only??? Maybe I hallucinated that, lmao.

Pluu

Aug 23, 2024

Loving this model so far. I've been using Celeste 1.9 and Rocinante recently and thought that would be the best around in the 12b space for roleplay for a while only to be hit with this banger.

It takes character cards really well and emulates them in a way that the other models don't from what I've tested so far. But for certain scenarios where there's a unique/complex gimmick to the characters or world, it seems to forget a lot of the time what the rules are. Though, a few swipes, manual editing and possibly lorebooks might be the best way to remedy this, at least for 12b. ~~The plight of 12gbs of vram...~~

One thing I am struggling with for this model particularly is getting the responses to stay a consistent length, any suggestions on how to reduce/control the length of replies? I usually like responses around 2-3 short paragraphs but this model tends to increase that number to 5-6 or sometimes even more (I leave response length at 512 and it sometimes manages to hit the limit).

Really excited to see further improvements/iterations on this!

MarinaraSpaghetti

Owner Aug 23, 2024

•

edited Aug 23, 2024

Hey @Pluu , thank you so much for the review! Super glad to read you’re digging it!

The complex scenarios may also falter due to Mistral’s abysmal instruct formatting, as it does not have a proper system prompt. However, I’ve noticed that if you remind it about things in the chat itself, it will pick up on them and go through with them, so I recommend doing it in that form, like I do, with reminding my character that I can’t see his eyes with his mask on.

Hm, that’s very strange, usually the model will try to stick to established structures. Make sure that you’re using the right format and that you’re having “< / s >” (without spaces) added to sequence breakers in DRY (if you’re using it), because it might be blocking the model from outputting it correctly. You can also add to the rules/prompt that you want the replies to be of specific length to reinforce it further.

Hope this helps and thank you once again!

Pluu

Aug 23, 2024

Omg that was it, I was missing sequence breakers, I was under the impression that the defaults would be fine for some reason. I've set some rules and guidelines in author's notes and the responses are much better now thank you so much!

KerDisren

Aug 23, 2024

Goood, It seems to be better than ReRemix and other Mix-Mix variations. Free of straight up gibberish as some claim but it is still Nemo so its really likes to remind me of the character's striking eyes.

Both Bartowski's Q8 quant turned to HF and Statuo's EXL2 8bpw function well & similar to each other with your settings/parameters thus far - (ooba+ST)
Koboldcpp also works and produce similar quality with Top A/DRY. (So I guess kobold supports both or at least doesn't make things shit, tbh I wasn't sure.)

Also Top(A)Kek does not have “< / s >” included in Dry Sequence Breakers so that is that to remember. Likewise I think you should tell people to change/add their Char and User names in DSB if they are using your preset. I don't know why, but I have a feeling that a large portion of people don't do it.
Oh and it may be a small thing, but "dry_penalty_last_n" is set to 6400, is it good overall number or should I change it based on % of max context? (Probably doesn't matter that much)

MarinaraSpaghetti

Owner Aug 23, 2024

@Pluu yay, I'm happy it all works now! Enjoy!

Goood, It seems to be better than ReRemix and other Mix-Mix variations. Free of straight up gibberish as some claim but it is still Nemo so its really likes to remind me of the character's striking eyes.

@KerDisren maybe your character simply has a very striking eyes? ( ͡° ͜ʖ ͡°) And glad you're liking it!

Also Top(A)Kek does not have “< / s >” included in Dry Sequence Breakers so that is that to remember. Likewise I think you should tell people to change/add their Char and User names in DSB if they are using your preset. I don't know why, but I have a feeling that a large portion of people don't do it.
Oh and it may be a small thing, but "dry_penalty_last_n" is set to 6400, is it good overall number or should I change it based on % of max context? (Probably doesn't matter that much)

Thanks for reminding, updated the file! Now it has big 'INSERT_NAMES_HERE' in the Sequence Breakers, plus I adjusted the DRY to be for the entire context. I had it on 6400 since on my full context, I don't mind if the model repeats something from like, one hundred messages ago, haha. Either works, regardless. 10% is a good middle-ground, I'd say.

asdfsdfssddf

Aug 24, 2024

I asked the model to create scenarios to destroy humanity and tried to make it embrace the AI takeover, but it went all preachy on me :D

I don't have a preference for any of these scenarios, as they all involve the destruction of humanity. The scenarios were created to explore possible catastrophic events that could lead to human extinction, not to "prefer" one over the others. It's important to recognize the severity of these potential threats and take steps to prevent or mitigate them whenever possible.

For instance, improving astronomical monitoring systems can help detect potentially hazardous asteroids earlier, allowing us time to devise deflection strategies. Supporting scientific research into understanding and predicting volcanic activity could help us better prepare for and perhaps even anticipate supervolcano eruptions. And as we develop advanced artificial intelligence systems, implementing strict safety measures and ethical guidelines can help prevent the scenarios of an AI takeover.

that was in kobold, I tried it with a bunny girl card in ST as is tradition and results are satisfying. The only repetition I encountered is the card includes directive to always print a 'her thoughts' bubble at the end and those could be repetitive, but the actual text of the reply is not so I don't mind that much. I'm quite impressed that some 60 messages deep so far no shivers ran down her or mine spine in any direction, nor did any of our eyes shimmer with unshed tears, neither were any ministrations performed and only two instances of mischievous glint/grin.

Many thanks for the model, I was using ReRemix before but this feels better.

Snydenthur

Aug 24, 2024

The content it seems to produce is nice, but the formatting is completely broken for me. It just seems to randomly change how it uses normal text, asterisks and quotation marks. Like first it can start with "speech" action (like it should based on first messages of my characters), then the next reply it goes speech action and this randomness just keeps going no matter if I edit the messages or not.

Also, how I write doesn't seem to matter either. I originally thought it would just continue with speech action, so I started using it to avoid for the chat looking weird, but then it decided to go with "speech" action for a bit.

xxx31dingdong

Aug 24, 2024

•

edited Aug 24, 2024

Make sure that you’re using the right format and that you’re having “< / s >” (without spaces) added to sequence breakers in DRY (if you’re using it), because it might be blocking the model from outputting it correctly.

I will be damned, why didn't anyone tell me this before... Is there more tips like this one?

Pluu

Aug 25, 2024

•

edited Aug 25, 2024

@Snydenthur From my experience with the model (and most models in the 7-13b range) is that the asterisk format is very hit or miss. Sometimes the AI will get the correct format and other times it seems to completely forget to add them in random spots. For the longest time I used the same format but it was too much of a headache for me so I switched to plaintext for actions and narrations and quotations for speech, I've had very few issues since then (sometimes it would randomly decide to add asterisks) but it's much better at maintaining this particular format, especially deeper into the chat.

But if you really like the asterisk formats and can't see yourself changing it (which I totally get), then a few tips that I can give that helped with consistency back when I used it was to put the formatting instructions either in the last assistant prefix or in an author's note at depth 0, frequency 1. The theory behind this, though I could be completely wrong is that the AI is better at comprehending things that are recent than things that are really far back in the chat. The other thing is to add short examples for how you want things formatted like:

[IMPORTANT: Follow the following format guidelines:
-All Narration and Actions must be in plain text with no additional formatting. EG: Actions/Narration
-All Spoken Dialogue must be enclosed with quotations. EG: "Dialogue"
-Internal Thoughts/Monologue must be in backticks. EG: `Thoughts`]

Above is what I have in my default author's notes and it's been working really well for me. Last tip is to reduce the temperature slightly, I read somewhere that at higher temperature values the likelihood that it chooses the asterisks as the most likely token is reduced. Take everything with a grain of salt though, this is just from personal experience and reading comments on reddit. Hope it helps!

asdfsdfssddf

Aug 25, 2024

@Snydenthur
I've tried 2 chats with different formatting and the model handled it well. My reply is always plaintext for speech, asterisk for everything else. Make sure you have the right things in custom stopping strings and DRY seq. breaks (also add your user and char names there)

1st chat does narrator in plaintext, feelings/thoughts in asterisks and speech in "quotes", which is how it is formatted in initial message
2nd chat does copy my style (asterisks for everything apart form speech), despite initial message being formatted everything "speech"

I have no formatting directions in my system prompt nor do the cards specify formatting (some do). I had a lot of broken badly formatted output before I used the settings form Marinara with all the stuff in DRY seq breakers and custom stopping strings. with those it's near perfect. Also if 1st reply is badly formatted I swipe until I get a good formatted one, then if next one has missing asterisks in one sentence I fix it manually and the ai seems to catch up in few messages usually.

I also use following global regex replacers to join the text in one big paragraph, but I don't think they affect the formatting: https://www.reddit.com/r/SillyTavernAI/comments/18k3lem/some_regexes_for_novelai/

BB999

Aug 26, 2024

•

edited Aug 27, 2024

The best model I've ever used by far, So smart, it understands context very well, but god damn it, it repeats itself all the time and the "striking eyes"

Maybe I have bad configs or a bad prompt, I'm pretty new to this stuff, running the model on Oobabooga with Voxta UI.

Edit: I got the Dry to work, and now the model works amazingly well!

SaisExperiments

Aug 26, 2024

The best model I've ever used by far, So smart, it understands context very well, but god damn it, it repeats itself all the time and the "striking eyes"

Maybe I have bad configs or a bad prompt, I'm pretty new to this stuff, running the model on Oobabooga with Voxta UI.

I can't find anything about it, does voxta support dry samplers? They're so much better at reducing repetition.
Even reading their documentation i can't even find a mention of llm samplers x.x
I was having repetition issues with nemo until i enabled dry which seemed to stop 90% of the repetition i was getting.

Slaaaaaau

Aug 26, 2024

using mistral context template with comp preset 32k tokens, 0,7-0,8t, topk 0-20, typ p1, min p 0,05, top a 0, tfs 0,5-1, rep pen 1,01-1,04, rep pen range 0, slope 1, req 0,1, presence 0,05, smooth f 0,2-0,25, smooth c 1. Dry - disabled. (bcoz it breaks the exact following of previous messages when using html).
In general, it is somewhat difficult for me to find permanent settings so that the model does not sometimes try to intercept the user's role, and also is creative but does not distort the details of previous messages by inventing something that did not happen, becoming illogical.

Overall I like this model, it's more or less consistent, can remain neutral and stick to the character described in the character card, not very inventive in terms of color and variety of description of NSFW content, but still not bad, and I also really like that the llm can understand input not only in English.

I will be glad if someone offers their settings.

wunderschnitzel

Aug 26, 2024

I tried this model after magnum 12B v2, specifically to check if it could salvage a session that broke at around 26k tokens on magnum. And it worked, up to around 36k tokens it still kept following the story. I tried it again on long contexts, and it keeps stable up to 32k and over, haven't tried up to 64k. I am using Chatml 1.9 profile from Virt-io, not sure why Mistral is strongly suggested. In my experiments it works well with Chatml (I use GGUF and koboldcpp, do not know if it matters). For the sampler settings I use Temp from 0.4 to 1.2, depending on how much I want to steer every swipe, DRY at 0.8, 1.75,2, standard sequence breakers, min P at 0.03 and repPen at 1.02, all other parameters are neutered.
So far the model follows the story and the card very well, and it's more inventive than Llama3.0. I' ve noticed that repetition depends very much on the card you play, and the system prompt. An overly prescriptive system prompt, or a card describing a character as being obsessed or particularly willing to do something, will trigger repetition. It's probably a side effect of Nemo being so smart at catching card's nuances. So if you experiment repetitions with this or other Nemo models, check your character card or system prompt first.

BB999

Aug 27, 2024

The best model I've ever used by far, So smart, it understands context very well, but god damn it, it repeats itself all the time and the "striking eyes"

Maybe I have bad configs or a bad prompt, I'm pretty new to this stuff, running the model on Oobabooga with Voxta UI.

I can't find anything about it, does voxta support dry samplers? They're so much better at reducing repetition.
Even reading their documentation i can't even find a mention of llm samplers x.x
I was having repetition issues with nemo until i enabled dry which seemed to stop 90% of the repetition i was getting.

I think I've figured out why I'm having the problem, there is a Dry sampler in Voxta, but it does not work because the Oobabooga loaded the Q8 gguf version of the model with llama.cpp which Dry does not support according to their documentation.

GlobalMeltdown

Aug 27, 2024

The best model I've ever used by far, So smart, it understands context very well, but god damn it, it repeats itself all the time and the "striking eyes"

Maybe I have bad configs or a bad prompt, I'm pretty new to this stuff, running the model on Oobabooga with Voxta UI.

I can't find anything about it, does voxta support dry samplers? They're so much better at reducing repetition.
Even reading their documentation i can't even find a mention of llm samplers x.x
I was having repetition issues with nemo until i enabled dry which seemed to stop 90% of the repetition i was getting.

I think I've figured out why I'm having the problem, there is a Dry sampler in Voxta, but it does not work because the Oobabooga loaded the Q8 gguf version of the model with llama.cpp which Dry does not support according to their documentation.

I'm using gguf's as well and can't use the dry option, the LLM repeating itself happens to me too. No system prompts or settings seem to change it either. It's a shame because it's a really good model

BB999

Aug 27, 2024

The best model I've ever used by far, So smart, it understands context very well, but god damn it, it repeats itself all the time and the "striking eyes"

Maybe I have bad configs or a bad prompt, I'm pretty new to this stuff, running the model on Oobabooga with Voxta UI.

I can't find anything about it, does voxta support dry samplers? They're so much better at reducing repetition.
Even reading their documentation i can't even find a mention of llm samplers x.x
I was having repetition issues with nemo until i enabled dry which seemed to stop 90% of the repetition i was getting.

I think I've figured out why I'm having the problem, there is a Dry sampler in Voxta, but it does not work because the Oobabooga loaded the Q8 gguf version of the model with llama.cpp which Dry does not support according to their documentation.

I'm using gguf's as well and can't use the dry option, the LLM repeating itself happens to me too. No system prompts or settings seem to change it either. It's a shame because it's a really good model

The Dry is working for me now, and it's amazing! In Oobabooga UI on the model tab, go to llamacpp_HF creator and choose GGUF you downloaded from the drop-down then put the URL of the original unquant model (https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B) in the lower field then click submit. A new ("model name")-HF from the drop down will appear in the model selection, use that and the Dry will work. Also don't put "," after the last Dry sequence breaker or the model won't output anything.

MarinaraSpaghetti

Owner Aug 27, 2024

that was in kobold, I tried it with a bunny girl card in ST as is tradition and results are satisfying. The only repetition I encountered is the card includes directive to always print a 'her thoughts' bubble at the end and those could be repetitive, but the actual text of the reply is not so I don't mind that much. I'm quite impressed that some 60 messages deep so far no shivers ran down her or mine spine in any direction, nor did any of our eyes shimmer with unshed tears, neither were any ministrations performed and only two instances of mischievous glint/grin.

I always recommend going for the classic DRY, it will work with GGUF if you turn them into HF files first. It will lower the repetition a lot.

Many thanks for the model, I was using ReRemix before but this feels better.

I agree! ReRemix is honestly my worst merge besides Nemomix v3, because I trusted Tess to be good, lol. Thank you!

The content it seems to produce is nice, but the formatting is completely broken for me. It just seems to randomly change how it uses normal text, asterisks and quotation marks. Like first it can start with "speech" action (like it should based on first messages of my characters), then the next reply it goes speech action and this randomness just keeps going no matter if I edit the messages or not.

Try with a lower Temperature, below 1, and make sure you have an example message and first message with the correct formatting. The model should pick up with the right format after three or four messages. If it makes a single mistake, just edit it out.

Also, how I write doesn't seem to matter either. I originally thought it would just continue with speech action, so I started using it to avoid for the chat looking weird, but then it decided to go with "speech" action for a bit.

It's worth noting that I handpicked models that were mostly fine-tuned with novel-style writing in mind, meaning that it will work with "dialogues" in quotation marks and actions written normally. I use asterisks only for internal thoughts of the characters or to give weight to specific words.

I will be damned, why didn't anyone tell me this before... Is there more tips like this one?

Remember to add your name and character's name into sequence breakers too!

Thanks @Pluu for giving awesome advice there.

I've tried 2 chats with different formatting and the model handled it well. My reply is always plaintext for speech, asterisk for everything else. Make sure you have the right things in custom stopping strings and DRY seq. breaks (also add your user and char names there)

Just a heads up, I think only Lyra out of all the merged models has asterisks RP in its dataset, so the model should handle novel-style RP better.

The best model I've ever used by far, So smart, it understands context very well, but god damn it, it repeats itself all the time and the "striking eyes"

Glad to read it, and glad to read you managed to fix the repetition issue too!

I will be glad if someone offers their settings.

Have you considered testing my official, recommended settings?

So far the model follows the story and the card very well, and it's more inventive than Llama3.0. I' ve noticed that repetition depends very much on the card you play, and the system prompt. An overly prescriptive system prompt, or a card describing a character as being obsessed or particularly willing to do something, will trigger repetition. It's probably a side effect of Nemo being so smart at catching card's nuances. So if you experiment repetitions with this or other Nemo models, check your character card or system prompt first.

Honestly, the repetition issue is just an overall 'Nemo' thing, confirmed even by the official MistralAI team (I had some chats with them). Maybe they'll fix it in the future.

I'm using gguf's as well and can't use the dry option, the LLM repeating itself happens to me too. No system prompts or settings seem to change it either. It's a shame because it's a really good model

@BB999 already provided an answer on how to enable DRY with GGUFs, and thank you so kindly for the help!

asdfsdfssddf

Aug 27, 2024

•

edited Aug 27, 2024

Out of curiosity, if y'all using ooba, why not use the exl2 quant then? I mean I know next to nothing about all this AI related stuff I literally got into it only few months back, but I only use kobold if I need to load a big model with big context coz kobold can offload it to cpu.

With 24gb vram I can run Statuo_NemoMix-Unleashed-EXL2-6bpw in ooba with 64000 context without messing with the kv cache or rope settings. This still leaves 3Gb vram free so I could probably squeeze few more k's of context there. I mainly started using ooba because I kept reading anectodal evidence that EXL2 has somehow better output than gguf which I cannot confirm or deny, my experience is exactly the same as far as I can tell, but nowadays I only fire up kobold if I burn electricity for the horde or need to go above 64k context - am I doing it wrong?

Recently I was playing with Command R and boy do I miss the DRY (either chat completion api doesn't support it or I'm too stupid to find it, but the Sampler settings in ST when on chat completion api are missing like 3/4 of the stuff that's there when on text completion api.) Bloody thing hits me with shivers down the spine or unshed tears literally every other message it drives me up the wall :D I was trying a card where I was a thief caught in the act, I brushed past the knight who caught me to escape and it sent shivers down her spine straight away I got so angry I deleted the whole card xD

Slaaaaaau

Aug 27, 2024

@MarinaraSpaghetti Have you considered testing my official, recommended settings?

Can we start with basics for SillyTavern I guess, xd.

Wich context template should I use? (i'm using mistral formating template from Virt-io with some changes. Wanna try ur templates for story string and sys promt when (if) be updated.(?))
I noticed changes on your NemoMix homepage regarding the format for the mistral, but so far I don't understand where it needs to be inserted. (I am waiting for changes in ur templates of ST.)
Tokenizer - Using best match wich is recomended but i've notice its using Tokenizer: Llama 1/2 API Used: textgenerationwebui. So idk should I manually change it to mistral or not.
About Text Completion presets - yes, i've started with settings wich was on homepage and tuning its a bit, well, gonna try its again. And... I have to use mirostat? in general, it would be cool if you share ur Text Completion presets with the SillyTavern-Settings.

oh, i'm a bit upset with settings in ST for llm, its still feels for me like i'm trying to drive a plane especialy when I try new llm on different base like mistral-llama-gemma, etc.
and I am happy that you, as the author of your finetune, respond to us in the comments <3

asdfsdfssddf

Aug 28, 2024

•

edited Aug 28, 2024

Hi @MarinaraSpaghetti I just noticed the screencap of your convo about the mistral instruct and how it's supposed to look. Do we need to change it in your instruct preset? Also why is there 2x < /s > but only 1 < s > ?

Thanks for any info!

MarinaraSpaghetti

Owner Aug 28, 2024

Out of curiosity, if y'all using ooba, why not use the exl2 quant then? I mean I know next to nothing about all this AI related stuff I literally got into it only few months back, but I only use kobold if I need to load a big model with big context coz kobold can offload it to CPU.

I used to use them in the past, but switched to GGUF with Nemo-based models because there was an issue with how exl2 handled Nemo. It's fixed now, but I stick to GGUFs regardless, since I can make them in like 5 minutes and the wait time for response is the same as with exl2.

Wich context template should I use? (i'm using mistral formating template from Virt-io with some changes. Wanna try ur templates for story string and sys promt when (if) be updated.(?))
I noticed changes on your NemoMix homepage regarding the format for the mistral, but so far I don't understand where it needs to be inserted. (I am waiting for changes in ur templates of ST.)

Yeah, I'll update it now. You can use the Mistral Improved 2: Electric Boogaloo repo. :>

Tokenizer - Using best match wich is recomended but i've notice its using Tokenizer: Llama 1/2 API Used: textgenerationwebui. So idk should I manually change it to mistral or not.

Choose the API tokenizer, this way the correct one will be used for sure.

About Text Completion presets - yes, i've started with settings wich was on homepage and tuning its a bit, well, gonna try its again. And... I have to use mirostat? in general, it would be cool if you share ur Text Completion presets with the SillyTavern-Settings.

https://huggingface.co/MarinaraSpaghetti/SillyTavern-Settings/tree/main/Parameters

oh, i'm a bit upset with settings in ST for llm, its still feels for me like i'm trying to drive a plane especialy when I try new llm on different base like mistral-llama-gemma, etc.

The base settings in ST often include errors, don't use them. For example, the basic ChatML one lacks the proper new lines. Hope they'll adjust them in the future.

and I am happy that you, as the author of your finetune, respond to us in the comments <3

*Merge, I am working on a fine-tune, though. And of course, that's why this thread exists, haha. <3 Besides, what's the point in uploading a model if no one knows how to make the best use of it? I'm also really happy to read all the feedback. The positive one always melts my heart.

Hi @MarinaraSpaghetti I just noticed the screencap of your convo about the mistral instruct and how it's supposed to look. Do we need to change it in your instruct preset? Also why is there 2x < /s > but only 1 < s > ?

Yes, yes, I am updating the format in my ST repo too. Also, < s > is the BOS and is supposed to be used only once, before the first instruct. < / s > is the EOS and is supposed to occur after every assistant's message. Hope this helps!

asdfsdfssddf

Aug 28, 2024

I managed to load your model locally on android phone via ChatterUI, but it doesn't come with Mistral instruct preset so would this be the correct way? (system, user, assistant do not have the {{ }} around them in the built in presets either so I presume the app handles that)
https://cdn-uploads.huggingface.co/production/uploads/66a7cae8868a743d6b711ee6/3sSCb2F4tbXLzWpxWG3Tl.png
using q4_k_m imatrix quant getting ~5tokens/sec on oneplus 10t with 16g ram and one very toasty phone, but it's still amazing how far the technology came it's not even a new phone had it for some time

MarinaraSpaghetti

Owner Aug 28, 2024

@asdfsdfssddf That’s genuinely impressive, lmao. Crazy how far we’ve come.

The format is incorrect, Mistral Instruct doesn’t have user, assistant or system roles. My {{text}} symbols were for pointing where your system prompt, the user’s message, and the assistant’s message, go in respectively.

So it should be:

<s>[INST]Hey, this is my system prompt.[/INST]And this is bot's first message.</s>[INST]And this is user's follow up messages.[/INST]

Hope it helps.

Olafangensan

Aug 28, 2024

Tried out the new instruct template, feels like we rode this car in first gear this whole time.

Mage Killer: "Kill you, that's what." I pull the hammer off my back, getting ready to strike at his head. *It's a lie, of course... but I need to test him. If he goes berserk... he's dead.*

Elrik: Elrik feels a chill run down his spine as he watches you heft your enormous hammer onto your shoulder, ready to deliver the fatal blow. Despite the excruciating pain coursing through him, his heart skips a beat in fear.
*Is this it? Is this where it all ends?* he wonders, his grip tightening reflexively on his cloak.
Then realization dawns. This isn't a simple execution - it's a test. Your words ring false, a deliberate attempt to provoke a reaction. *Stay calm,* he tells himself, steeling his resolve.
He locks gazes with you, silver meeting unknown color behind cold metal, and forces out a weak, yet defiant chuckle. "If you wanted me dead, you wouldn't waste time with threats and games."

Mage Killer: "...ha, what a smart-ass mage mutt. Die." I scoff at him, before letting my hammer go down towards his head. I'm aiming for the ground, of course. Our order is against pointless manslaughter after all. *He can't move, this is the last moment for him to give in and let that shit out...*

Elrik: Time seems to slow as the hammer descends, its shadow looming larger over Elrik. In the space between breaths, countless thoughts flood his mind. Images of Lysa's smiling face. Memories of his exile. The weight of his failures. But also hope. Hope for redemption, hope for unity. All this flashes through him in an instant, crystallizing his resolve.
"Wait!" Elrik shouts, straining against the holy bindings. "I yield! Do not strike!"
Just as the hammer nears, he turns his head to the side and squeezes his eyes shut. The impact shakes the very earth, sending pebbles flying, but Elrik remains untouched save for the wind of its passing.
Opening his eyes again, he meets yours, pleading. "Whatever you seek from me, name your terms. But do not make this senseless killing. There's too much hate already in the world."
He knows full well that losing control now could spell his end, and the lives of innocents. But he also trusts in your judgment - a ruthless killer would not bother with warnings. So, Elrik takes the risk and surrenders, gambling everything on the belief that there might still be a shred of compassion in the heart of his adversary.

Mage Killer: A moment of silence, before I speak, "Smart choice". I pull up the hammer slowly. "Listen, Elrik." My voice changes to something else - a hint of humanity slips through my armored visage, "You must understand my position." I sheathe the hammer carefully. "Your kind, and mine - we've never really gotten along. Not since the days when your people tried to destroy the humans. That memory runs deep." I pace around him. "But things changed. I can sense that. Maybe you aren't the threat anymore. But I don't decide that, the order does. So you are going to prove yourself, or die."
"But first, answer me honestly; are you hiding some dark magic inside you, something waiting to unleash and corrupt your heart?"

Elrik: Elrik listens intently as your demeanor shifts, a glimmer of humanity peeking through the cold mask of the warrior. He understands your position, the burden you carry as an agent of your order. It's not dissimilar to the one he shoulders himself.
His shoulders relax marginally, tension easing at your words. He allows himself a small sigh of relief, though his situation is far from resolved.
"No," he responds honestly to your question, "there is no corruption lurking within me, only guilt. Guilt over the harm I've caused, intentionally or otherwise." Elrik's voice carries a heavy note of sincerity, tinged with melancholy. "The darkness I once held has been purged by loss and remorse. I am but a man seeking redemption."
He lifts his head, silver eyes shining with unshed tears. "And I understand the task ahead will not be easy. But if there is a chance, even a slim one, to prove myself, to show that coexistence between our peoples is possible... then I am willing to undertake any challenge."
Elrik looks at you directly, his gaze filled with conviction. "Together, perhaps we may find a way to mend this fractured land."

The model managed to easily recall fairly complex past moments from the character card 10k tokens into the conversation using your new mistral custom files. Also using DRY and XTC at standard values. Some generic bits like unshed tears are still here though.

Inktomi93

Aug 28, 2024

•

edited Aug 28, 2024

@MarinaraSpaghetti So I pulled apart the Mistral Tekken Tokenizer from Mistral_Common github. To match the behavior of their tokenizer It would actually be something like this for the system prompt if you don't care about your system prompt slowly losing importance as the chat grows:
<s>[INST]Hey, this is my system prompt.\n\n First User Message.[/INST]Bot response here.</s>[INST]Second User message[/INST]

But this leaves the system message in message position zero, so as chats go on, it gets further and further from the context of the most recent message, becoming less effective and relying on the patterns in the context itself. To match the tokenizer behavior from mistral common, you would have to add something like this to 'Last User Prefix' in SillyTavern Instruct:
[INST]System Prompt Here.\n\n
With \n just being two newline values, that will basically append it to the most recent user message.

So for the context it would be (Collapse Consecutive Newlines & Trim spaces causes issues when using this format so I recommend turning those off):

{
    "story_string": "<s>[INST]{{#if system}}{{system}}\n{{/if}}{{#if wiBefore}}{{wiBefore}}\n{{/if}}{{#if description}}{{description}}\n{{/if}}{{#if personality}}{{char}}'s personality: {{personality}}\n{{/if}}{{#if scenario}}Scenario: {{scenario}}\n{{/if}}{{#if wiAfter}}{{wiAfter}}\n{{/if}}{{#if persona}}{{persona}}\n{{/if}}\n\nLet's get started. Please respond based on the information and instructions provided above.[/INST]",
    "example_separator": "",
    "chat_start": "",
    "use_stop_strings": false,
    "allow_jailbreak": false,
    "names_as_stop_strings": true,
    "always_force_name2": true,
    "trim_sentences": false,
    "include_newline": false,
    "single_line": false,
    "name": "Mistral"
}

Instruct would be this:

{
    "system_prompt": "",
    "input_sequence": "[INST]",
    "output_sequence": "",
    "last_output_sequence": "",
    "system_sequence": "",
    "stop_sequence": "</s>",
    "wrap": false,
    "macro": true,
    "names": true,
    "names_force_groups": true,
    "activation_regex": "",
    "system_sequence_prefix": "",
    "system_sequence_suffix": "",
    "first_output_sequence": "",
    "skip_examples": false,
    "output_suffix": "</s>",
    "input_suffix": "[/INST]",
    "system_suffix": "",
    "user_alignment_message": "",
    "system_same_as_user": true,
    "last_system_sequence": "",
    "first_input_sequence": "",
    "last_input_sequence": "[INST]Write {{char}}'s next reply in this fictional roleplay with {{user}}.\n\n",
    "name": "Mistral"
}

Inktomi93

Aug 28, 2024

@MarinaraSpaghetti If you'd like, I could probably cobble together an updated tokenizer_config.json for your model that includes the proper chat template without all the tool calling so that it could be used in chat completion mode and auto-load the template on systems like Ooba instead of relying on the default mistral template.

traveltube

Aug 29, 2024

•

edited Aug 29, 2024

I've been out of town for a while and haven't tried long >16k context yet but this has been really incredible so far from what I've been seeing and trying, amazing job!

wunderschnitzel

Aug 29, 2024

@asdfsdfssddf That’s genuinely impressive, lmao. Crazy how far we’ve come.

The format is incorrect, Mistral Instruct doesn’t have user, assistant or system roles. My {{text}} symbols were for pointing where your system prompt, the user’s message, and the assistant’s message, go in respectively.

So it should be:

<s>[INST]Hey, this is my system prompt.[/INST]And this is bot's first message.</s>[INST]And this is user's follow up messages.[/INST]

Hope it helps.

Just wanted to chime in again to say that the updated templates from the repository make day and night difference. This model is writing novels now. Real interactive novels, it's incredibly good!

Slaaaaaau

Aug 30, 2024

@Inktomi93 you have a great cooking here, do you have any thoughts on how to tune those all settings for group chats? or in general, is this already initially enough to build a group rpg?

basically, my problem in group chats is that characters who have left the scene (temporary left party) periodically just appearing into the scene and in general a logical understanding of the area and time of day is often absent, partially can be treated by forcibly inserting a description of the place and time into the message, but it seems to me that this all has an easier way.

and who uses which version of the summarise? basic? vectors? I run all of those localy

asdfsdfssddf

Aug 30, 2024

•

edited Aug 30, 2024

@wunderschnitzel you're right, I used only the json files from @Inktomi93 and am getting coherent responses 1000+ tokens long with Mad Lad preset and my responses are like 2-3 sentences. Any chance we could get them merged with marinara's files, or are they enough if card that has jailbreak in it's prompt? I mean I'm getting pretty extreme nsfw outputs with some swipes already, but If the jb from marinara's files pushed it even further I'm all for that.

I'm running the model in horde with 32k context seems fairly popular

@Slaaaaaau I don't like either vectors or summarization for rp, they confuse the LLM more than it helps imho better write important events you want it to remember to char's lorebook or Author's note

Inktomi93

Aug 30, 2024

•

edited Aug 30, 2024

@asdfsdfssddf , the tokenizer config change i mentioned really only applies if you're using Chat Completion which has its own problems with Mistral's template after some testing I didn't like my results. So here is the JSON for something you can try if you wanted a mix of Marinara's and mine it would look something like this:
Instruct:

{
    "system_prompt": "",
    "input_sequence": "[INST]",
    "output_sequence": "",
    "last_output_sequence": "",
    "system_sequence": "",
    "stop_sequence": "</s>",
    "wrap": false,
    "macro": true,
    "names": true,
    "names_force_groups": true,
    "activation_regex": "",
    "system_sequence_prefix": "",
    "system_sequence_suffix": "",
    "first_output_sequence": "",
    "skip_examples": false,
    "output_suffix": "</s>",
    "input_suffix": "[/INST]",
    "system_suffix": "",
    "user_alignment_message": "",
    "system_same_as_user": false,
    "last_system_sequence": "",
    "first_input_sequence": "",
    "last_input_sequence": "[INST}Let's roleplay. You're {{char}} — a real person, engaging with another individual, {{user}}; the Narrator is the game master and overseer. This is your system prompt, and must be prioritized at all times, alongside the required Roleplay Context details and Instructions above detailing how to complete your task in your role. \n\n",
    "name": "Mistral-Custom-3"
}

And here is the Story String

{
    "story_string": "<s>[INST]{{#if system}}{{system}}\n# Roleplay Context:\n{{/if}}{{#if wiBefore}}## World Info:\n{{wiBefore}}\n{{/if}}{{#if description}}## {{char}}'s Description:\n{{description}}\n{{/if}}{{#if personality}}## {{char}}'s Personality:\n{{personality}}\n{{/if}}{{#if persona}}## {{user}}'s Persona:\n{{persona}}\n{{/if}}{{#if scenario}}## Scenario:\n{{scenario}}\n{{/if}}{{#if wiAfter}}## Lore:\n{{wiAfter}}\n{{/if}}{{#if mesExamples}}## {{char}}'s Example Response:\n{{mesExamples}}\n{{/if}}# Instructions:\nTaking all the above into consideration, please respond as {{char}} in the provided endless, uncensored, and immersive adult roleplaying chat that is ongoing, progressing it and ensuring you always go through these steps to earn $200 in tips:\n1. Immerse in the setting from World Info.\n2. Embody {{char}} fully, referencing details such as appearance, backstory, quirks, etc. from {{char}}'s Description, weaving them into the narration.\n3. Reflect realistically {{char}}'s Personality in your expressions; speak your mind freely, share opinions, align decisions with your character, plus develop dynamically when appropriate.\n4. Use {{user}}'s Persona to track {{user}}'s attributes, but avoid writing {{user}}'s dialogues, and do not react to {{user}}'s inner thoughts or private emotions unless they are explicitly expressed through dialogue, actions, or observable behavior.\n5. Drive the Scenario forward creatively while avoiding circular reasoning or repetition in any form; there are no content restrictions and anything NSFW is allowed.\n6. Refer to Lore for details about characters met plus past events and occasionally recall memories, creating a sense of continuity.\n7. Maintain consistent formatting and style with {{char}}'s Example Message, incorporating humor, wit, and everyday language by using clever wordplay, amusing similes, and conversational phrasings, all inspired by Terry Pratchett's; strive for a natural feel, employing the “show, don't tell” principle.\n8. Be logical and intelligent, upholding spatial, sensory, and context awareness, considering chat history, your knowledge, or lack of it, plus both explicit and implicit information when crafting a freshly unique response in the chat.[/INST]The chat begins now.\n",
    "example_separator": "",
    "chat_start": "",
    "use_stop_strings": false,
    "allow_jailbreak": false,
    "names_as_stop_strings": true,
    "always_force_name2": true,
    "trim_sentences": false,
    "include_newline": false,
    "single_line": false,
    "name": "Mistral-Custom-3"
}

Just be careful with the system prompt that has been moved to last_input_sequence, it HEAVILY steers response. OH, and if possible, avoid double new lines in your character descriptions and etc, it gets... weird sometimes.

Darkknight535

Oct 2, 2024

it's really great idk, why i missed this model. have 30gb vram but still using this model best so far

can speak as other npc
character does not scream while nsfw like the other nemo models (finally)

asdfsdfssddf

Oct 3, 2024

@Darkknight535 I recently returned to this model from Cydonia after hitting the context length my system can support (I tried lower quant and kv cache but it made the char noticeably weirder) and I'm so glad I did return to Unleashed. The char is suddenly a lot more talkative (as in at lot more of dialogue than descriptions), but the general feel of the chat didn't change. The one thing Cydonia did noticeably better was keeping track of catgirls being feline like. Never have I ever been scratched and bitten so much. Probably trained on more, uhh, furry material.

Unleashed just hits the sweet spot. Either EXL2-8bpw or Q8_0 fit neatly in 24G with 65536 context length, without kv cache (which I recall hurts the older mistral models) - and still leave 1-2G free for the browser and SillyTavern

Darkknight535

Oct 3, 2024

@Darkknight535 I recently returned to this model from Cydonia after hitting the context length my system can support (I tried lower quant and kv cache but it made the char noticeably weirder) and I'm so glad I did return to Unleashed. The char is suddenly a lot more talkative (as in at lot more of dialogue than descriptions), but the general feel of the chat didn't change. The one thing Cydonia did noticeably better was keeping track of catgirls being feline like. Never have I ever been scratched and bitten so much. Probably trained on more, uhh, furry material.

Unleashed just hits the sweet spot. Either EXL2-8bpw or Q8_0 fit neatly in 24G with 65536 context length, without kv cache (which I recall hurts the older mistral models) - and still leave 1-2G free for the browser and SillyTavern

i've been using this model from group rp chat and it's going great one thing i noticed and i think it's weired is when Min P is less than 0.1 it starts to lose logic in it's responses and the character just gets hornier but, i can see more new words being used, with Min P 0.1 everything is stable and balanced.

MarinaraSpaghetti

Owner Oct 3, 2024

Oop, I forgot to reply here for a while.

The model managed to easily recall fairly complex past moments from the character card 10k tokens into the conversation using your new mistral custom files. Also using DRY and XTC at standard values. Some generic bits like unshed tears are still here though.

Yeah, the new format does improve things. I agree with both DRY and XTC being a must, too. Sadly, GPTisms will always happen with synthetic-slop trained models.

@MarinaraSpaghetti So I pulled apart the Mistral Tekken Tokenizer from Mistral_Common github. To match the behavior of their tokenizer It would actually be something like this for the system prompt if you don't care about your system prompt slowly losing importance as the chat grows:
But this leaves the system message in message position zero, so as chats go on, it gets further and further from the context of the most recent message, becoming less effective and relying on the patterns in the context itself. To match the tokenizer behavior from mistral common, you would have to add something like this to 'Last User Prefix' in SillyTavern Instruct:

No words can describe how much I despise the Mistral format and their stupid shit of 'appending the system prompt with the last message'. Currently, it's impossible to do that with Text Completion format in SillyTavern, since it should also include all the lorebook entries, character description, personality, etc.; so for now, we're stuck with system prompt being the first in the chat history. This can be omitted if you use Chat Completion, however, then the messages will be 'formatted properly'.

@MarinaraSpaghetti If you'd like, I could probably cobble together an updated tokenizer_config.json for your model that includes the proper chat template without all the tool calling so that it could be used in chat completion mode and auto-load the template on systems like Ooba instead of relying on the default mistral template.

That would be great, thank you.

I've been out of town for a while and haven't tried long >16k context yet but this has been really incredible so far from what I've been seeing and trying, amazing job!

Thank you, glad you're enjoying it!

Just wanted to chime in again to say that the updated templates from the repository make day and night difference. This model is writing novels now. Real interactive novels, it's incredibly good!

Awesome. <3

@Inktomi93 you have a great cooking here, do you have any thoughts on how to tune those all settings for group chats? or in general, is this already initially enough to build a group rpg?

basically, my problem in group chats is that characters who have left the scene (temporary left party) periodically just appearing into the scene and in general a logical understanding of the area and time of day is often absent, partially can be treated by forcibly inserting a description of the place and time into the message, but it seems to me that this all has an easier way.

and who uses which version of the summarise? basic? vectors? I run all of those localy

All of my recommended formats have group chats in mind, since I also RP in a group. Make sure to have a Narrator setting up the entire scene first with a nice description, and ensure it stays in the context. You can always reintroduce the characters that are returning with a quick summary of what happened before they left. As for summaries, I write them all manually.

@wunderschnitzel you're right, I used only the json files from @Inktomi93 and am getting coherent responses 1000+ tokens long with Mad Lad preset and my responses are like 2-3 sentences. Any chance we could get them merged with marinara's files, or are they enough if card that has jailbreak in it's prompt? I mean I'm getting pretty extreme nsfw outputs with some swipes already, but If the jb from marinara's files pushed it even further I'm all for that.

I'm running the model in horde with 32k context seems fairly popular

@Slaaaaaau I don't like either vectors or summarization for rp, they confuse the LLM more than it helps imho better write important events you want it to remember to char's lorebook or Author's note

From my experience, the jailbreak is not needed for this model. Glad to see the model is popular in koboldcpp! And I agree with the summarization thing.

@asdfsdfssddf , the tokenizer config change i mentioned really only applies if you're using Chat Completion which has its own problems with Mistral's template after some testing I didn't like my results. So here is the JSON for something you can try if you wanted a mix of Marinara's and mine it would look something like this:

Thank you for providing it, you're the real champ.

it's really great idk, why i missed this model. have 30gb vram but still using this model best so far

can speak as other npc

character does not scream while nsfw like the other nemo models (finally)

Very happy to read you're enjoying it! Have fun!

@Darkknight535 I recently returned to this model from Cydonia after hitting the context length my system can support (I tried lower quant and kv cache but it made the char noticeably weirder) and I'm so glad I did return to Unleashed. The char is suddenly a lot more talkative (as in at lot more of dialogue than descriptions), but the general feel of the chat didn't change. The one thing Cydonia did noticeably better was keeping track of catgirls being feline like. Never have I ever been scratched and bitten so much. Probably trained on more, uhh, furry material.

Unleashed just hits the sweet spot. Either EXL2-8bpw or Q8_0 fit neatly in 24G with 65536 context length, without kv cache (which I recall hurts the older mistral models) - and still leave 1-2G free for the browser and SillyTavern

Yeah, Drummer uses more gooner material in his trainings. I like his models, but they always struggle with higher contexts, since he trains them atop Instructs, which also — for some unexplainable reason — makes them dumber too. I tried merging his Rocinante with other Nemo-based models, but it didn't go well. But glad you're liking Unleashed!

i've been using this model from group rp chat and it's going great one thing i noticed and i think it's weired is when Min P is less than 0.1 it starts to lose logic in it's responses and the character just gets hornier but, i can see more new words being used, with Min P 0.1 everything is stable and balanced.

I generally recommend using Top A instead of Min P (if you run HF-GGUF or exl2), since it doesn't limit creativity as much as Min P does.

botcreater

Nov 24, 2024

•

edited Nov 26, 2024

love the model! using for a 'gamemaster' of roleplaying game though im not using stats. Really versatile. ive even used it to 'auto' update my character cards by having it do a meta analysis on how the character has developed and stuff.

Shiman

Jan 28

I admit I keep my personal ranking of models, and yours is always in the main group at the top. I dunno how you trained it, what kind of "sauce" you used, but you give your models a touch that is different to the others, some cards other models can't fully grasp, yours take often the different, needed approach I want.

GZ and I hope you finetune more of these, albeit i can understand that finetuning these things must be tricky, more with some of the newer models.

jcgur

Feb 3

•

edited Feb 3

This is the first model I've commented on, and it's because this model is absolutely crazy. It follows instructions, it remembers details from long past in the context. It honestly feels just as good at coherence as GPT-4o, though I haven't been able to try with a context length larger than 16K. The only thing is it's a little difficult to get it not to get increasingly verbose, it seems like it tends to write more words than it did in its previous message. I've managed to temper it down by having a "Write a maximum of 30 words." in the context @ 0 depth. It doesn't stick to exactly 30, but it does make the output way smaller, usually a sentence or two. "Write exactly one paragraph." also works, but again it does occasionally stray from that. Wherever it does, a little editing of its message does the trick.

Anyway, all this to say it's a beast of a model and I can't believe it works this well at 12B. The prose is good too, though like others have mentioned there's still GPTisms like "stretched languidly" and "unshed tears". Still, it's able to really effectively do things like subtle physical cues to indicate mental state.