Feedback
If my reddit account wasn't suspended for some reason - I'd recommend everyone this model
Right now im using IQ2_M quant with 24k 4bit context and this is the best model i have tried for roleplay(Can't run Behemoth due to its size)
Initially in L2 times i used 7B, 9B and 11B models via exllama. Then tried Mixtral and was fashionated by MoE capabilities and speed so much, that made a lot of frankenmoe merges in L3 times. Then for a month or two stuck to "Free API" CommandR+ and sometimes use it even now
Now bought myself a P40 and with my weird 36GB VRAM RTX3060+P40 setup decided to try big models, initially some from Sao but his models were too focused on eRP(or that's just my settings were shitty), then tried Magnum 72B V2 and was amazed for it's quality, when Magnum 72B V4 came out i sticked to it for some time before trying Nemotron 70B and it's finetunes like Nautilus and Sunfall. Honestly, wasn't impressed by Nautilus but Sunfall was great and amazed me almost every day, so it became my daily driver
However even IQ2_M quant of Endurance 100B amazed me by it's quality, first, it has much higher emotional intelligence, i haven't seen a model better for a good drama. Second, it's capability at remember things, i often want to see how LLM understands what's happening in the roleplay and write something like "Stop the roleplay, analysis it, write an essay, split it at subtopics" or "Stop the roleplay, analysis {{user}}, write an essay, split it at subtopics". CommandR+ was the only one who could break out of character and write analysis without any crutches on first try, Nemotron and it's finetunes were also great at this, however Endurance's analysis was something else, not only first try breaking out of character but also it's analytical capabilities were the best among every model i tried for RP purposes... and im not even sure im using fitting settings.
The only "bad" thing i can say is that my speed dropped by ~1t/s compared to ~70B models, but that was expected, if using row split wouldn't turn my output into garbage maybe that would be better, anyway I'll try row split later, koboldcpp updated many times so i have hope
Well, what can i say... Bravo, just bravo
Hm, maybe I should try this too?
I have similar configuration - P40 + 1080 ti 11gb.
I though it will not fit, but since you're fitting it in 36 gb and it works good, them I will try now too.
Thanks for the recommendation. And also, if this model really is so good, than it's amazing that we have something for that VRAM range.
Probably you wouldn't be able to run 24k context with 1GB of VRAM less, try 16k with koboldcpp or run 18~20k context by using llamacpp
However it seems like model has some problems with monster girls, for example it describe naga to have "human body above her neck", not just upper body, however more detailed description in card/persona solved the problem. Maybe model needs additional healing on larger dataset? Or general fine-tuning before RP tune? What are fitting settings for Endurance?
Also it seems like model has great "long term" memory that i instantly noticed, but bad "short term" one, for example it couldn't track pieces of clothes taken off and kept making up something i wasn't even able to follow anymore
Half of the time it fails at numbers, if there's three mentioned corpses, one from group A and two from group B - it'll say there's only one A and one B corpses... Or birthdays, i mentioned that my character "was 8 when apocalypse happened" but in one of analysis reswipes i was told that my character "was born 8 years old", i don't know how and why that happened tbh
Anyway, such problems are not pretty rare to see and probably i would get rid of half of them the moment I'll find greater preset for Endurance. So, thanks for the release! Despite mentioned problems it's still an awesome model
Ahem Ahem Dear Santa Drummer, can i ask, would you make Endurance v1.1 in any time soon?
🥺
👉👈
Unfortunately it seems like I cant run even 1024 context with iq2_m. maybe because windows is eating around 400mb of vram on 11 gb card. I tried different params and fa without luck.
Downloading IQ2_XS quant, hoping it still will be good.
(running with koboldcpp)
even with qv cache it doesn't work.
Hm, but it runs relatively okay on cpu. I will try IQ2_XS on gpu and try original behemoth/this model with higher quant on cpu.
IQ2_XS is smart on first glance. Will compare with q4km behemoth.
Non-english perfomance is nuked btw (by english-calibrated low imatrix quant ot by layers removal)
Great model, I think this is finally the nail on the MidnightMiqu coffin for 48gb vram. Even though I've only tested it with one card, it feels like a smarter version of MidnightMiqu that retains its creativity while picking up on more nuances
Just finished a somewhat complex scenario with dubious moral implications (the best kind).
90% of the models will either have the character screaming and yelling at you like some r*pe victim(is not by a mile) or go full S E X O on it. I have not tried Midnight Miqu yet, but I have a feeling it would just spiral into a full-blown soap opera kind of drama.
Saying that...this model did none of that. The sheer emotional intelligence took me by complete surprise - the girl, whose character is wild and aggressive, slowly softened as a real connection formed between her and her fiancé as they spent the night going against her parents' wishes and just cuddling to sleep.
I love how genuinely graceful the slope of her emotional progression was. She didn't give in right away, nor did she fight until blood came out (it did happen the next morning though, lowering the temperature worked wonders to fix it). At some point I even added some side characters and it held up marvelously!
There was even a lot of playing around with the slop, reforming it to fit the context of the scene. Love it!
This is now my favorite model, no questions asked. The dream of running it locally is the only thing left before this becomes a genuinely useful writing assistant for me.
@Olafangensan Thank you! If you're not running it locally, you might want to run Behemoth 123B instead.