Q3 and Q2 quants broken
Undi's own provided GGUF files seem to work fine but not these.
I keep getting errors when trying to load them in oobabooga's text generation webui. I tried both llamacpp and llamacpp_hf loaders and neither of them work.
Llamacpp loader error:
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 5120, 32001, got 5120, 32000, 1, 1
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "C:\Users\pasil\text-generation-webui\server.py", line 223, in <module>
shared.model, shared.tokenizer = load_model(model_name)
File "C:\Users\pasil\text-generation-webui\modules\models.py", line 79, in load_model
output = load_func_map[loader](model_name)
File "C:\Users\pasil\text-generation-webui\modules\models.py", line 225, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
File "C:\Users\pasil\text-generation-webui\modules\llamacpp_model.py", line 91, in from_pretrained
result.model = Llama(**params)
File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\llama_cpp_cuda\llama.py", line 365, in __init__
assert self.model is not None
AssertionError
Exception ignored in: <function LlamaCppModel.__del__ at 0x000001ABCECE8AF0>
Traceback (most recent call last):
File "C:\Users\pasil\text-generation-webui\modules\llamacpp_model.py", line 49, in __del__
self.model.__del__()
AttributeError: 'LlamaCppModel' object has no attribute 'model'
Llamacpp_hf loader error:
error loading model: create_tensor: tensor 'token_embd.weight' has wrong shape; expected 5120, 32001, got 5120, 32000, 1, 1
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "C:\Users\pasil\text-generation-webui\server.py", line 223, in <module>
shared.model, shared.tokenizer = load_model(model_name)
File "C:\Users\pasil\text-generation-webui\modules\models.py", line 79, in load_model
output = load_func_map[loader](model_name)
File "C:\Users\pasil\text-generation-webui\modules\models.py", line 250, in llamacpp_HF_loader
model = LlamacppHF.from_pretrained(model_name)
File "C:\Users\pasil\text-generation-webui\modules\llamacpp_hf.py", line 211, in from_pretrained
model = Llama(**params)
File "C:\Users\pasil\anaconda3\envs\textgen\lib\site-packages\llama_cpp_cuda\llama.py", line 365, in __init__
assert self.model is not None
AssertionError
Actually, although the repo is misconfigured, when I try to make a new GGUF, it correctly ignores added_tokens.json. So it's possible I made a mistake when making the first quants, like maybe I edited my local files in the wrong way - I can't remember now what I did for this model specifically, but I can see I had to make some local edit to it.
Anyway, I've got them working now and the new upload will start in a moment.
New quants are uploaded and are working fine
Q3_K_M:
system_info: n_threads = 15 / 30 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
How much wood would a woodchuck chuck if a woodchuck could chuck wood?
### Response:
A woodchuck, also known as a groundhog, is a herbivore and does not typically "chuck" (throw) wood. They are more likely to burrow into the ground or move leaves and debris with their strong front legs. However, if we were to assume that a woodchuck could chuck wood like an axe, it's impossible to determine how much wood they would be able to throw due to lack of information about their physical strength or any context regarding the size and type of wood being referred to. [end of text]
yeah thanks now it's working for me as well.