RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Hi, I'm new to using models and I'm trying to test your model but using your example I get this error: RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Do you know how to fix it?
I have an RTX4090 with 24G VRAM, I tried with 'device_map='auto', 'cuda:0', 'cuda', 'cpu'. All give the same error.
Here is the full traceback in case it helps
Traceback (most recent call last):
File "C:\ai\VARCO-VISION-14B-HF.py", line 75, in
output = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
File "C:\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2215, in generate
result = self._sample(
File "C:\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 3206, in _sample
outputs = self(**model_inputs, return_dict=True)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\models\llava_onevision\modeling_llava_onevision.py", line 668, in forward
image_features = self.get_image_features(
File "C:\Python\Python310\lib\site-packages\transformers\models\llava_onevision\modeling_llava_onevision.py", line 525, in get_image_features
image_features = self.vision_tower(pixel_values, output_hidden_states=True)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\models\siglip\modeling_siglip.py", line 1190, in forward
return self.vision_model(
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\models\siglip\modeling_siglip.py", line 1089, in forward
hidden_states = self.embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, *kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\models\siglip\modeling_siglip.py", line 311, in forward
patch_embeds = self.patch_embedding(pixel_values) # shape = [, width, grid, grid]
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 549, in _conv_forward
return F.conv2d(
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Thank you for your interest! :)
The issue seems to have occurred due to a mismatch between the input type and the weight type.
To address this, I have updated the 'Direct Use' section of the README file as follows (with surrounding code omitted for brevity).
import torch
...
inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(device, torch.float16)
...
This worked! Thank you.