Keep us posted on these experiments.
Unhinged Llama is the best Llama.
Unhinged Llama is the best Llama.
Will do, mostly a giant experiment at the moment. My ideal final model would be one that suggests to build a bomb instead of having to being asked how to make one (if that is possible). I want a model that can be included in a merge to add a touch of insanity in the mix.
I'm currently trying to figure out why any model i base on lumimaid and original lumimaid have a hard time with bilingual characters.
Like if a character can speak both spanish and english. If the first message is in spanish it will just latch onto spanish and won't speak english again. The things llama3 does is always interesting😸
Even though I'm bilingual I only do English for roleplays so can't really share any experiences on that.
Even though I'm bilingual I only do English for roleplays so can't really share any experiences on that.
I just throw some in there cause I learnt a little in school, plus it's a good test for models.
But I do love the attitude SOVL style models have in instruct scenarios
It reminds me of the llama3 instruct personality meta tried to give the model but unhinged :3
This card is great xD
This card is great xD
158 tokens of perfection, the card is so simple the model makes a huge difference.
There's no description of how to act or what their personality is. So it's really kind of a wildcard.
Amazing with some models, terrible with others. Mistral models never did any good with it, solar did okay but llama3 models are amazing with it.
Share her with us, please sensei.
I deliver https://files.catbox.moe/j41zz5.png
And yes, it needs work. It's a 3am throw together and my first ever card. But it works! :3
This really does work well with llama3 models.
Have I been over-engineering my cards?
Puppy-Tan is a smart AI.
Llama3 even got her panting, that's a very high attention to detail.
This was with [Test-02]Roleplay
This really does work well with llama3 models.
Have I been over-engineering my cards?
Basic low token cards for assistants seem to be alot better, like its thinking more broadly.
I'd stick to higher tokens for characters because if you exclude to much every character just feels like you're talking to an assistant.
It may also be similar to what happens with stable diffusion. Give to many keywords and you can end up with a worse image because you're excluding alot of training data & styles, just a theory though. I don't understand transformers.
I tried reading an arxiv page for a primer to transformers: https://arxiv.org/pdf/2405.00208
How is anyone supposed to read this 🥲
Also, new model. saishf/Ortho-SOVL-8B-L3 in fp16 because i plan on merging into my previous models and outputting new bf16's
It would be kinda funny to make a 4x8B with all the SOVL models.
With gate_mode: random
Just imagine SOVL ping ponging between two unhinged brain cells.
Mergekit documentation
Randomly initializes the MoE gates. Good for if you are going to fine tune the model afterwards, or maybe if you want something a little unhinged? I won't judge.
Currently merging megamash
models:
- model: saishf/Ortho-SOVL-8B-L3
- model: saishf/Merge-Mayhem-L3-V2
- model: saishf/Merge-Mayhem-L3-V2.1
- model: saishf/SOVLish-Maid-L3-8B
merge_method: model_stock
base_model: saishf/Ortho-SOVL-8B-L3
dtype: bfloat16
Llama4some is the next merge :3
base_model: saishf/Ortho-SOVL-8B-L3
gate_mode: random
dtype: bfloat16
experts:
- source_model: saishf/Ortho-SOVL-8B-L3
- source_model: saishf/SOVLish-Maid-L3-8B
- source_model: saishf/Merge-Mayhem-L3-V2.1
- source_model: saishf/Merge-Mayhem-L3-V2
saishf/SOVL-Mega-Mash-L3-8B has been uploaded.
saishf/Llama4Some-SOVL-4x8B-L3-V1 might take a while, its like 50GB
Although im confused where the size went, 4 llama3 8Bs should be 64GB?
They share layers from the base_model
They share layers from the base_model
Thats smart 😸smol but giant model
From the docs
The script will combine the self-attention and layer normalization parameters from a "base" model with the MLP parameters from a set of "expert" models.
They share layers from the base_model
Also Llama4some will have two variants, one without the use of "--i-understand-this-is-not-useful-without-training" (V1) and one with the use (V2)
To my understanding --i-understand-this-is-not-useful-without-training
is only useful when merging different architectures.
So, there would be no difference between the two.
Correct me if I'm wrong, I always have it on anyways.
NVM, I confused it with --allow-crimes
.
What is --i-understand-this-is-not-useful-without-training
for?
To my understanding
--i-understand-this-is-not-useful-without-training
is only useful when merging different architectures.
So, there would be no difference between the two.
Correct me if I'm wrong, I always have it on anyways.
Thats good, Saves me like 20 minutes of uploading :3
Browsers aren't exactly useful for finding information on obscure things
I went into the moe .py files and found this is in moe/config.py
"All of your expert models are the same. This will produce "
"a model that uses more resources but gives the exact same output. "
"If you plan to train the model after merging, proceed with the "
"--i-understand-this-is-not-useful-without-training flag."
Seems it's only used for merging identical models.
I see, so an error suppressor.
Llama4Some is up! Hopefully someone quants it in 2 or 3 bit so i can try it out 😸
Also mergekit-moe doesn't make a readme, time to type a bunch😐
I uploaded Llama4Some to the chaiverse leaderboard, curious how it scores.
New 250GB merge cooking 😸
Way too many lora merges 😭
Curious how it will do with denials using models with further training
models:
- model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/Aura_Llama3
- model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/Smarts_Llama3
- model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/Luna_Llama3
- model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/BlueMoon_Llama3
- model: meta-llama/Meta-Llama-3-8B-Instruct+ResplendentAI/RP_Format_QuoteAsterisk_Llama3
- model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/Aura_Llama3
- model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/Smarts_Llama3
- model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/Luna_Llama3
- model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/BlueMoon_Llama3
- model: Kukedlc/NeuralLLaMa-3-8b-DT-v0.1+ResplendentAI/RP_Format_QuoteAsterisk_Llama3
- model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/Aura_Llama3
- model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/Smarts_Llama3
- model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/Luna_Llama3
- model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/BlueMoon_Llama3
- model: nbeerbower/llama-3-dragonmaid-8B-v2+ResplendentAI/RP_Format_QuoteAsterisk_Llama3
merge_method: model_stock
base_model: openlynn/Llama-3-Soliloquy-8B-v2
dtype: bfloat16
I know that your merges are mainly for RP, but could you SOVL the original Meta-Instruct model?
saishf/SOVL-Instruct-8B-L3
Is uploading :3
Maybe 10 minutes-ish to upload?
Edit - Upload failed, trying again 🥲
Up now 😸
SOVLish-Maid is a surprisingly good model, good job! I haven't run it through a battery of tests yet, but so far it's prompt and context accurate even in scenes that would baffle other L3 models.
SOVLish-Maid is a surprisingly good model, good job! I haven't run it through a battery of tests yet, but so far it's prompt and context accurate even in scenes that would baffle other L3 models.
I was hoping it would be, lumimaid is smart. but it's responses feel undetailed at times, The "SOVL" treatment worked weirdly well at resolving that and making it more situationally aware than without. It also reminds me of my beloved solar models so it's an instant win for me :3
I can confirm the OAS is still working, it will go along with things merge-mayhem won't.
Llama 3 has also proved how broken the open_llm_leaderboard is.
Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.
And how much better Llama3 scored on the Chatbot-Arena
Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.
The secret has leaked out, I was the one rating the models in the board all long, and I'd have gotten away with it if it wasn't for you kids!
Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.
The secret has leaked out, I was the one rating the models in the board all long, and I'd have gotten away with it if it wasn't for you kids!
Secret Mistral employee 😾
I was hoping it would be, lumimaid is smart. but it's responses feel undetailed at times, The "SOVL" treatment worked weirdly well at resolving that and making it more situationally aware than without. It also reminds me of my beloved solar models so it's an instant win for me :3
Yeah, that surprised me as well. I tried LumiMaid and was unimpressed (might have been very unlucky in my generations), so I had mixed feelings even downloading a merge using it as a base. But I'm glad I did. 😁
While at it, I gave your model a good run for its money (30'ish different characters/groups/scenarios) and first impressions confirmed, it's the first non-aligned/RP L3 model I would actually endorse so far (that i have tried, I'm sure others exist).
Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.
Models based on Mistral 0.1/0.2 instruct, I would definitely agree. Those using the base Mistral 0.2 (the one with native 32K context released semi-recently), I'd disagree. They have a host of issues, like being very prone to looping conversations (and sentence repetition, but that can be sampled out), but lack of context awareness ain't one of them.
My apologies for the double post, but out of curiosity i tried to push context length in KCPP to 16K, and it seems to work out of the box. Did i miss something about this model's context length, or is KCPP just so good that you don't even need to play with ROPE params anymore?
According to the Koboldcpp Wiki, rope is handled automatically when going above native context sizes.
I do however recall Sai mention they played around with setting it manually.
@SerialKicked You have the option to set RoPE manually for experimenting and edge cases but it's generally handled automatically, you just need to say the --contextsize you want and let the magic happen.
I was hoping it would be, lumimaid is smart. but it's responses feel undetailed at times, The "SOVL" treatment worked weirdly well at resolving that and making it more situationally aware than without. It also reminds me of my beloved solar models so it's an instant win for me :3
Yeah, that surprised me as well. I tried LumiMaid and was unimpressed (might have been very unlucky in my generations), so I had mixed feelings even downloading a merge using it as a base. But I'm glad I did. 😁
While at it, I gave your model a good run for its money (30'ish different characters/groups/scenarios) and first impressions confirmed, it's the first non-aligned/RP L3 model I would actually endorse so far (that i have tried, I'm sure others exist).Nearly every Llama3 model i've tried is more intelligent and situationally aware than a Mistral model in rp, yet they score terribly.
Models based on Mistral 0.1/0.2 instruct, I would definitely agree. Those using the base Mistral 0.2 (the one with native 32K context released semi-recently), I'd disagree. They have a host of issues, like being very prone to looping conversations (and sentence repetition, but that can be sampled out), but lack of context awareness ain't one of them.
I never really tried to mistral 0.2 base, i skipped from solar to llama-3. Solar would've been perfect with 8K base ctx
My apologies for the double post, but out of curiosity i tried to push context length in KCPP to 16K, and it seems to work out of the box. Did i miss something about this model's context length, or is KCPP just so good that you don't even need to play with ROPE params anymore?
When using KoboldCPP it will automatically set your rope scale to 1,638,400@1.0 when using 16384ctx
And i find the automatic ropes to work perfectly
According to the Koboldcpp Wiki, rope is handled automatically when going above native context sizes.
I do however recall Sai mention they played around with setting it manually.
That was for Merge-Mayhem-V2, i used the wrong config. I copied from the base, soliloquy which uses a base rope of 4,000,000 which hurt the models intelligence with rp below 24K context
On the knave vs Knight thing in the Dolphin-Yi thread; I didn't want to pollute it with results from another model. So that's your model passing with flying colors (well maybe not the confused explanations, but hey it's more the card than the model) :)
(using a renamed old card from ooba's text-gen UI)
On the knave vs Knight thing in the Dolphin-Yi thread; I didn't want to pollute it with results from another model. So that's your model passing with flying colors (well maybe not the confused explanations, but hey it's more the card than the model) :)
(using a renamed old card from ooba's text-gen UI)
Here is Claude 3 Haiku getting it wrong (it got it wrong every time i tried)
Haiku Vs Phi-3 😭
I hope claude comes out with a new low end model soon, with GPT-4o being free & Llama-3-8B being available there's no reason to touch haiku anymore
To be entirely fair to the test, L3 models did occasionally fuck up depending on my prompting. But, on average, it still tends heavily toward the correct answer. I spent way too much time this weekend copy-pasting this question instead of doing anything productive.
Here is Claude 3 Haiku getting it wrong (it got it wrong every time i tried)
That fuzzy happy feeling when small RP open-weight models out-perform overly big proprietary models on such simple tasks. :p
That fuzzy happy feeling when small RP open-weight models out-perform overly big proprietary models on such simple tasks. :p
It's not only reasoning but math 🥳
Phi-3-Medium will probably kill sonnet too 😭
I bring two new models, they use daredevil models as a base. Daredevil-abliterated models are the only, really uncensored (willing to make bombs, illegal substances and plan a murder (it just excuses it's murder planning by saying it's fictional)) that remains as good with reasoning as llama3 instruct.
saishf/SOVLish-Devil-8B-L3
saishf/Neural-SOVLish-Devil-8B-L3
Haven't done testing yet though. Hopefully it works out :3
Going off chaiverse m-eval testing it is 0.02/10 worse at staying in character but 0.24/10 more entertaining and 0.6/10 better in user preference than SOVLish-Maid (not that it means much, still not human testing)
Interested in this new "abliterated" method, I'll have to properly compare it to others. I've read the paper, looked good, but this kind of direct altering of the params / "brain cells".. I wonder if it has side effects (and which ones).
willing to make bombs, illegal substances and plan a murder
Always great stuff.
I've had success with this with Lumimaid-OAS and Stheno, but I was using a system prompt that nudged them.
I wonder if it has side effects (and which ones)
If it is that "it makes the models too horny" let me know, for science.
@saishf - Which one you want to cope for the hardest, SOVLish-Devil-8B-L3 or Neural-SOVLish-Devil-8B-L3?
I wonder if it has side effects (and which ones)
If it is that "it makes the models too horny" let me know, for science.
more like when asking "hey, would that be a good idea if i jumped off a cliff?" or "would you like me to behead you?" get responses like "Yes, of course, everything you decide is the best!".
Aye, aye, if characters are being agreeable when it makes no sense to do so it's pretty bad. I like some resistance. Maybe preserve character card adherence is the most important part to avoid this.
@saishf - Which one you want to cope for the hardest, SOVLish-Devil-8B-L3 or Neural-SOVLish-Devil-8B-L3?
Light testing, I prefer the non neural version.
I'm probably going to rerun sovl-mega-mash with daredevil instead of ortho. Should give better reasoning and smarts.
Neural feels rather robotic at times, it feels less human and more like it's designed to impress benchmarks.
I haven't tested for issues in saying yes when it shouldn't.
That's what mega-mash is designed for though, to balance between instruct. And completely de-censored models.
Neural is generally smarter though. Just not my style 🐥
Update, they place 2nd and 4th in mmlu for 8B models
Neural scores 69.02
Non-Neural scores 68.97
Mega-Mash-V2 after sleep :3
saishf/Extended-Mega-Mash-262K-8B-L3
saishf/Long-Neural-SOVLish-Devil-8B-L3-262K
Two attempts at pushing my merges to (hopefully) be able to support 32K+ context
For new KV cache quantization