"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!"
Good evening,
I am attempting to host Alpaca through the Text Generation Web UI (oobabooga). I have gotten my Alpaca model to load successfully using the AutoGPTQ model loader. When I enter a prompt into the text generation window and await a response from the AI, I am met with an "Is typing..." status from the AI for about two minutes, then I receive this error in the console window.
My knowledge of AI spans only a small amount, I would appreciate any insight as to what might be causing this. My machine specs are as follows:
NVIDIA GeForce GTX 1050ti
24GB Installed RAM
Traceback (most recent call last):
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\text-generation-webui\modules\callbacks.py", line 55, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\text-generation-webui\modules\text_generation.py", line 293, in generate_with_callback
shared.model.generate(**kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py", line 438, in generate
return self.model.generate(**kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1588, in generate
return self.sample(
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2642, in sample
outputs = self(
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 806, in forward
outputs = self.model(
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 693, in forward
layer_outputs = decoder_layer(
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\transformers\models\llama\modeling_llama.py", line 408, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\nn_modules\fused_llama_attn.py", line 53, in forward
qkv_states = self.qkv_proj(hidden_states)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Justin\Desktop\GPT4\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\nn_modules\qlinear\qlinear_cuda_old.py", line 264, in forward
out = out + self.bias if self.bias is not None else out
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Output generated in 34.38 seconds (0.00 tokens/s, 0 tokens, context 37, seed 219873787)
i dont think a 1050ti has enough vram to run this model at all?
this is just barely over 8gb meaning you need 10gb of vram minimum, the 1050ti is a 4gb card according to nvidia's website
idk if oogabooga supports cpu inference or loading only parts of the model