Is it possible to upload a merged/quantized GGUF?

#1
by robbie0 - opened

I'm very interested in tinkering with this model after seeing highly impressive results with v0.3.1-hf, but neither my machine nor Colab has enough RAM to merge and quantize this model down to something that will run locally for me. Would it be possible to upload merged GGUFs quantized to Q8_0 and Q4_K_M?

Also, have you looked into using rinna/nekomata-14b as a base?

Owner
β€’
edited Feb 18

@robbie0 Thanks for your interest in my experiment, and sorry for the late reply πŸ˜…
I haven't tested this 14b qlora properly yet (nor the 13b one) because I ran out of time, but sure, I will upload the GGUF for you.

Also, have you looked into using rinna/nekomata-14b as a base?

I didn't know about it, that continued training sounds promising! However, I've got a strong suspicion that what VNTL lacks right now is a further improved dataset, so I'm not sure if it's worth trying another base model for now.

@robbie0 Uh... Sorry, but I couldn't do a GGUF. I managed to merge the model, but the llama.cpp conversion script doesn't seem to work correctly for models with the llama architecture that don't use the SPM tokenizer, and even after I tried to patch the script, the resulting model didn't translate anything correctly.
So I uploaded the merged model instead, it's the best I can do at the moment: https://huggingface.co/lmg-anon/vntl-qwen-14b-v0.1-hf
However, I did manage to upload the GGUF for the 13B model: https://huggingface.co/lmg-anon/vntl-13b-v0.2-gguf

Thanks so much for your work! I wonder if it would be worth putting an issue in llama.cpp for the tokenizer bug. Other than that, I’ll try to see if I can work the 13B model into my program.

Sign up or log in to comment