It seems worse than Open Chat 3.5 base?

#1
by BrainSlugs83 - opened

Just trying to use this as a stand in for 3.5, it seems to get very poor results. I'm not testing in any well known benchmarks or anything, just trying to chat with it, -- asking it to summarize passages of text, etc. -- and it seems to get confused very quickly or give wrong answers to simple questions / misunderstand what the user is trying to say. -- Open Chat 3.5 (non-16k) behaves much better out of the box.

Is there a specific format change or specific configuration required for this one vs the base?

I'm using LlamaSharp (a wrapper for llama.cpp) if it helps.

NurtureAI org

Not sure how llamasharp works, but this was tested using transformers and also using the original openchat prompt. Llama.cpp only works with gguf files, and this is not that.

NurtureAI org

also try with lower temp and high top_p. Hope that helps.

Not sure how llamasharp works, but this was tested using transformers and also using the original openchat prompt. Llama.cpp only works with gguf files, and this is not that.

Oh, apologies! -- you are correct -- to be clear I'm using @TheBloke 's gguf conversions for both models at Q8 for comparison.

FWIW, he seems to be "the guy" for uploading GGUF conversions of popular models, and his model page links back to this one as the source for the 16k variant. But maybe I should check on that page and log an issue there first?

also try with lower temp and high top_p. Hope that helps.

I'll give it a shot.

Though another thing just occurred to me -- Is there anything regarding yarn / rope scaling that needs to be configured when using this variant of the model?

NurtureAI org

llama.cpp should read it from gguf file set rope values to 0 should work.

NurtureAI org

Also check out this reddit thread for all the stuff we figured out to make openchat prompt work correctly with llama.cpp it was rather tricky: https://www.reddit.com/r/LocalLLaMA/comments/185my1b/

the model is starling but it still uses openchat prompt so just wanted to show you what we found.

NurtureAI org

do not use the rope values that you see in that thread however as they are incorrect for the 16k model

NurtureAI org

Yours should be either rope_freq_base = 0, rope_freq_scale = 0, and if that doesn't work set the real values of: 100000.0, and 1

NurtureAI org

those values would be correct for the 16k, I hope that helps. Let me know.

NurtureAI org

I just realized i can link you to the settings config for lmstudio and starling: https://huggingface.co/NurtureAI/Starling-LM-11B-alpha-v1-GGUF/blob/main/lmstudio-config.json

Remember though do not use those values for rope scaling, but everything else should be the same, oh and the context size obviously would be larger for 16k not 4096.

NurtureAI org

with openchat once we figured out the template properly it really unlocked the full potential of both openchat and starling.

how to run as per the guide using VLLM on OpenChat - this one failed complaining no openchat.json file ! THanks

I just realized i can link you to the settings config for lmstudio and starling: https://huggingface.co/NurtureAI/Starling-LM-11B-alpha-v1-GGUF/blob/main/lmstudio-config.json

Remember though do not use those values for rope scaling, but everything else should be the same, oh and the context size obviously would be larger for 16k not 4096.

Link is 404'd.

Sign up or log in to comment