Generate a bunch of crap?

#2
by huggingfacess - opened

Very strange, running the following example produces an incorrect statement?

./main -ngl 35 -m Calme-4x7B-MoE-v0.2-GGUF.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"

As a model trained on

20: Question: newlines:

  • and the most commonly known as an individual to:
  1. A large amountingredactually,
    INSTARTIme in598746/ "The question:As a humanoidentitled by the of AI is designed to #A20 on an important AI generated from areservaries an open AIr an dtookaying a instantiq an individual a, which includes one oftheone of the with are known as being trained, a key in many be a trainAI

INST at the to a commona t a important an unnamed byof theare designed here ares Ais an important individualizes theare more often is the, a 'd the one is commonly referred the one are all ofthe most important a are being a spacefor 5: The H. Here a varietyn's are not only to a common theare uncommona the period4s T o Nan A and0be to be 1 ( the are
to F2 I, known 7 33re " is a # an E for Manning 9 This array 6A 5h some, with the American 8n The 2. As 3 were designed by 4, its own 20 to 91 are a T-like

Nieline known Train Commonalysis train in your trainable 9
various trains

Yes, there is an issue with the quantization of MoE models coming from Mergekit by Llama.cpp. (the actual model works fine, so as the pf16 GGUF)

I am following it up in Llama.cpp, hopefully they can fix/support MoE quantization in Llama.cpp soon and I will re-upload the models. (I will soon upload the pf16 that works)

Maybe it has something to do with this.

https://github.com/ggerganov/llama.cpp/pull/5754

Maybe it has something to do with this.

https://github.com/ggerganov/llama.cpp/pull/5754

Good find! I've been looking for mergekit related issues, and there are lots of work in progress.

However, this PR seems to be merged already, and I quantized these models with a built from 2 days ago. Unfortunately, it seems I already used the PR and it is still not working. I'll keep looking for an appropriate issue/PR to chime in, if I couldn't find any, I'll make one.

Maybe it has something to do with this.

https://github.com/ggerganov/llama.cpp/pull/5754

Good find! I've been looking for mergekit related issues, and there are lots of work in progress.

However, this PR seems to be merged already, and I quantized these models with a built from 2 days ago. Unfortunately, it seems I already used the PR and it is still not working. I'll keep looking for an appropriate issue/PR to chime in, if I couldn't find any, I'll make one.

It seems to be the same problem, I don’t know how he solved it.

https://huggingface.co/zhengr/MixTAO-7Bx2-MoE-v8.1/discussions/3
https://huggingface.co/MaziyarPanahi/MixTAO-7Bx2-MoE-v8.1-GGUF/discussions/3

Maybe it has something to do with this.

https://github.com/ggerganov/llama.cpp/pull/5754

Good find! I've been looking for mergekit related issues, and there are lots of work in progress.

However, this PR seems to be merged already, and I quantized these models with a built from 2 days ago. Unfortunately, it seems I already used the PR and it is still not working. I'll keep looking for an appropriate issue/PR to chime in, if I couldn't find any, I'll make one.

It seems to be the same problem, I don’t know how he solved it.

https://huggingface.co/zhengr/MixTAO-7Bx2-MoE-v8.1/discussions/3
https://huggingface.co/MaziyarPanahi/MixTAO-7Bx2-MoE-v8.1-GGUF/discussions/3

So the issue is that my quantization from his model, it actually works:

image.png

So this could mean the Llama.cpp either worked at some point with MoE models and stopped in the very recent changes (which I always pull and make a new build), or the way he made that MoE is very different. I see if I can ask him how he made the MoE, was it just a mergekit via hidden gates, or maybe he did something extra.

Sign up or log in to comment