Error when trying to run in OobaBooga

by AIGUYCONTENT - opened Jun 13, 2024

Jun 13, 2024

•

edited Jun 13, 2024

I'm getting an error when trying to run in OobaBooga:

17:33:41-171561 INFO Loading "tess-v2.5-qwen2-72B-q4_k_m.gguf"
17:33:41-285599 ERROR Failed to load the model.
Traceback (most recent call last):
File "/home/me/Desktop/text-generation-webui-main/modules/ui_model_menu.py", line 244, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/me/Desktop/text-generation-webui-main/modules/models.py", line 82, in load_model
metadata = get_model_metadata(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/me/Desktop/text-generation-webui-main/modules/models_settings.py", line 67, in get_model_metadata
bos_token = metadata['tokenizer.ggml.tokens'][metadata['tokenizer.ggml.bos_token_id']]
~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'tokenizer.ggml.bos_token_id'

pabloce

Owner Jun 14, 2024

how much VRAM you have sir?
the model q4_k_m use like 47.4 GB and the q3_k_m use like 37.7 GB maybe that is the error?
nevertheless I haven't use it OobaBooga I don't have the hardware for it.
This was just made to put on spaces on here https://huggingface.co/spaces/poscye/chat-with-tess and works great.

You can also test the one that made @bartowski here https://huggingface.co/bartowski/Tess-v2.5-Qwen2-72B-GGUF#download-a-file-not-the-whole-branch-from-below you can check based on the hardware you have. He made complete quants

bartowski

Jun 14, 2024

wouldn't surprise me if oobabooga needed a llamacpp update :')

AIGUYCONTENT

Jun 14, 2024

I have 50GB of VRAM. It's weird because the second I push "Load" in OobaBooga, I get the error message. So, I don't think it's due to VRAM. ChatGPT says it's missing the tokenizer file (tokenizer.ggml.bos_token_id).

AIGUYCONTENT

Jun 14, 2024

Oh hey Bartowski, are you saying I need to manually do that? I run this in Ubuntu and just updated the WebUI a few hours ago.

I will give your quant a test. I have a 4090, 4080, and 3080 hooked up to my machine. Will be swapping out the 3080 for a 3090 tomorrow for a total of 64GB VRAM. I hope this is enough.

bartowski

Jun 14, 2024

The reason i suspect it's ooba's issue is that running the raw llama.cpp ./main (well, ./llama-cli now..) yields perfect output, so if the source works but a branch doesn't, probably the branch is what's messing it up (ooba's webui in this case)

pabloce changed discussion status to closed Jun 15, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment