Not-For-All-Audiences

nsfw

Inference Endpoints

Model card Files Files and versions Community

8x22B Inquiry

by OrangeApples - opened May 16

Discussion

OrangeApples

May 16

@Undi95 @IkariDev are you planning on making an RP finetine for Mistral 8x22B? I gave WizardLM-2-8x22B a spin and it left me pleasantly surprised. Using an IQ2_M imatrix quant, it was super coherent and creative, and even felt faster than a Q4_K_S 70B on my system (3090, DDR4 RAM). Can't help but wonder how well something like a Lumimaid 8x22B would perform for RP.

OrangeApples changed discussion title from 8x22B to 8x22B Inquiry May 16

Undi95

NeverSleep org May 17

We will see. For now Llama 3 is good enough for us, 8x22B is a monster to fine-tune, it's big ! Hahaha

saishf

May 17

https://github.com/hiyouga/LLaMA-Factory

You'd need like a full H100/DGX node for frozen fine tuning 😵

OrangeApples

May 17

True, it's no wonder that there aren't many RP finetunes for 8x22B yet (Sao10K's experimental is the only one I'm aware of). Just glad the possibility is there! :)
Anyway, thanks for all your contributions to the community. Looking forward to seeing your next L3 models.

OrangeApples changed discussion status to closed May 17

BigHuggyD

May 17

We will see. For now Llama 3 is good enough for us, 8x22B is a monster to fine-tune, it's big ! Hahaha

Chonky! You fairly happy with the Llama3 de-censoring efforts?

saishf

May 18

We will see. For now Llama 3 is good enough for us, 8x22B is a monster to fine-tune, it's big ! Hahaha

Chonky! You fairly happy with the Llama3 de-censoring efforts?

It sounds weird but I'm currently finding a large portion of llama3 censorship is in the prompt. Messing with prompts I've seen models go from reluctant to do most things past chatting to willing to do almost anything.

De-censoring llama3 in instruct seems to be a lot harder.
Weirdly quantization affects the de-censoring of llama3 models, gguf models of de-censored models aren't as de-censored as bf16 models.
It'll happen eventually 🐥

zaq-hack

May 19

8x22B is incredibly capable ... but tuning would obviously be really tough. Unfortunately, I think there will be more capable things before it is "affordable" to tune 8x22B models.

On the other hand, Llama-3 has yet to really impress me even as much as Miqu, most of the time. It sometimes spits out some good stuff, really creative, but ... the censorship, the quirks, the quants, it just seems too much hassle to tame it, imo. Maybe someone will crack that code, but so far, I'm not much a fan of Llama-3. The problem is that tuning it at 8B can yield enough interesting stuff that it still seems worth pursuing. And that's where I think Undi, Ikari, and hundreds of other folks are with it. Even if it turns out to be "fool's gold" in the long run, it still shows some good flashes of shine, right now.

OrangeApples changed discussion status to open May 19

saishf

May 19

•

edited May 19

8x22B is incredibly capable ... but tuning would obviously be really tough. Unfortunately, I think there will be more capable things before it is "affordable" to tune 8x22B models.

On the other hand, Llama-3 has yet to really impress me even as much as Miqu, most of the time. It sometimes spits out some good stuff, really creative, but ... the censorship, the quirks, the quants, it just seems too much hassle to tame it, imo. Maybe someone will crack that code, but so far, I'm not much a fan of Llama-3. The problem is that tuning it at 8B can yield enough interesting stuff that it still seems worth pursuing. And that's where I think Undi, Ikari, and hundreds of other folks are with it. Even if it turns out to be "fool's gold" in the long run, it still shows some good flashes of shine, right now.

I think the hope behind llama 3 is that is one of the only options for people with limited vram, Phi-3 has shown to be even harder to crack, Mistral 7B 0.2 suffers from repetition issues and Solar doesn't extend past 8K well.
As someone stuck with 8GB of vram, I'm hopeful for llama-3 because if Phi-3 7B & 14B are as hard to break alignment on as the 3.8B, Llama-3 will be broken before those will. The Yi-1.5 models are overly moralized and lack fine-tunes & Falcon 11B seems to be lacking fine-tunes.
After seeing the capabilities llama-3-8B has while fitting in the same space as Mistral-7B-0.2, It's really hard to go back.
Edit - I just spotted a dolphin version of Yi-1.5 9B & 34B, 32k ctx is planned too, Title: Wow - best dialog AND internal reasoning yet!

OrangeApples

May 20

Tried dolphin-2.9.1-yi-1.5-34b the other day but wasn't impressed, at least for RP. Still prefer finetunes & merges of the older yi models like RP-Stew-v2.5-34B, and of course there's c4ai-command-r-v01 (35B) which punches way above its weight. Nevertheless, it's always nice to see new models around the 30B range. Have high hopes for that Gemma 2 27B model coming out this June.

By the way, @saishf from your experience, would you say that SOLAR-based models are still superior to Llama 8B ones for RP and creative storytelling? From my limited testing, it seems to me that Fimbulvetr-11B-v2 and WestLake-10.7B-v2 (both SOLAR-based) are still the best tiny models out right now.

saishf

May 20

Tried dolphin-2.9.1-yi-1.5-34b the other day but wasn't impressed, at least for RP. Still prefer finetunes & merges of the older yi models like RP-Stew-v2.5-34B, and of course there's c4ai-command-r-v01 (35B) which punches way above its weight. Nevertheless, it's always nice to see new models around the 30B range. Have high hopes for that Gemma 2 27B model coming out this June.

By the way, @saishf from your experience, would you say that SOLAR-based models are still superior to Llama 8B ones for RP and creative storytelling? From my limited testing, it seems to me that Fimbulvetr-11B-v2 and WestLake-10.7B-v2 (both SOLAR-based) are still the best tiny models out right now.

Solar is still the best option if you aren't after high context. I still find myself going back to solar occasionally when I want something more creative

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment