ExLlama is not working, received "shape '[1, 64, 64, 128]' is invalid for input of size 65536" error

#6
by charleyzhuyi - opened

I've updated text-generation-web-UI to latest (transformers updated to 4.31.0) and also manually verified the exllama folder also has been updated and contains this https://github.com/turboderp/exllama/commit/b3aea521859b83cfd889c4c00c05a323313b7fee commit.

Exllama is able to load the module, but when I typing, i got:

Traceback (most recent call last):
File "c:\oobabooga_windows\text-generation-webui\modules\text_generation.py", line 331, in generate_reply_custom
for reply in shared.model.generate_with_streaming(question, state):
File "c:\oobabooga_windows\text-generation-webui\modules\exllama.py", line 98, in generate_with_streaming
self.generator.gen_begin_reuse(ids)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\generator.py", line 186, in gen_begin_reuse
self.gen_begin(in_tokens)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\generator.py", line 171, in gen_begin
self.model.forward(self.sequence[:, :-1], self.cache, preprocess_only = True, lora = self.lora)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\model.py", line 887, in forward
r = self._forward(input_ids[:, chunk_begin : chunk_end],
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\model.py", line 968, in _forward
hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device], lora)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\model.py", line 471, in forward
hidden_states = self.self_attn.forward(hidden_states, cache, buffer, lora)
File "c:\oobabooga_windows\installer_files\env\lib\site-packages\exllama\model.py", line 389, in forward
key_states = key_states.view(bsz, q_len, self.config.num_attention_heads, self.config.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 64, 64, 128]' is invalid for input of size 65536
Output generated in 0.00 seconds (0.00 tokens/s, 0 tokens, context 65, seed 789726404)

is there anyone able to get the exllama working?

Thanks

use the latest version in main branch of exllama and latest released version of transformer

text-generation-webui provides its own exllama wheel, and I don't know if that's been updated yet. Try pip3 uninstall exllama in the Python environment of text-generation-webui, then run again. That will cause exllama to automatically build its kernel extension on model load, which will therefore definitely include the llama 70B changes

Sign up or log in to comment