Error when trying to run in OobaBooga

#1
by AIGUYCONTENT - opened

I'm getting an error when trying to run in OobaBooga:

17:33:41-171561 INFO Loading "tess-v2.5-qwen2-72B-q4_k_m.gguf"
17:33:41-285599 ERROR Failed to load the model.
Traceback (most recent call last):
File "/home/me/Desktop/text-generation-webui-main/modules/ui_model_menu.py", line 244, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/me/Desktop/text-generation-webui-main/modules/models.py", line 82, in load_model
metadata = get_model_metadata(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/me/Desktop/text-generation-webui-main/modules/models_settings.py", line 67, in get_model_metadata
bos_token = metadata['tokenizer.ggml.tokens'][metadata['tokenizer.ggml.bos_token_id']]
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'tokenizer.ggml.bos_token_id'

how much VRAM you have sir?
the model q4_k_m use like 47.4 GB and the q3_k_m use like 37.7 GB maybe that is the error?
nevertheless I haven't use it OobaBooga I don't have the hardware for it.
This was just made to put on spaces on here https://huggingface.co/spaces/poscye/chat-with-tess and works great.

You can also test the one that made @bartowski here https://huggingface.co/bartowski/Tess-v2.5-Qwen2-72B-GGUF#download-a-file-not-the-whole-branch-from-below you can check based on the hardware you have. He made complete quants

wouldn't surprise me if oobabooga needed a llamacpp update :')

I have 50GB of VRAM. It's weird because the second I push "Load" in OobaBooga, I get the error message. So, I don't think it's due to VRAM. ChatGPT says it's missing the tokenizer file (tokenizer.ggml.bos_token_id).

Oh hey Bartowski, are you saying I need to manually do that? I run this in Ubuntu and just updated the WebUI a few hours ago.

I will give your quant a test. I have a 4090, 4080, and 3080 hooked up to my machine. Will be swapping out the 3080 for a 3090 tomorrow for a total of 64GB VRAM. I hope this is enough.

The reason i suspect it's ooba's issue is that running the raw llama.cpp ./main (well, ./llama-cli now..) yields perfect output, so if the source works but a branch doesn't, probably the branch is what's messing it up (ooba's webui in this case)

pabloce changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment