NCSOFT/VARCO-VISION-14B-HF · RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Dec 2, 2024

Hi, I'm new to using models and I'm trying to test your model but using your example I get this error: RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

Do you know how to fix it?

I have an RTX4090 with 24G VRAM, I tried with 'device_map='auto', 'cuda:0', 'cuda', 'cpu'. All give the same error.

Here is the full traceback in case it helps

Traceback (most recent call last):
File "C:\ai\VARCO-VISION-14B-HF.py", line 75, in
output = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
File "C:\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2215, in generate
result = self._sample(
File "C:\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 3206, in _sample
outputs = self(**model_inputs, return_dict=True)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\models\llava_onevision\modeling_llava_onevision.py", line 668, in forward
image_features = self.get_image_features(
File "C:\Python\Python310\lib\site-packages\transformers\models\llava_onevision\modeling_llava_onevision.py", line 525, in get_image_features
image_features = self.vision_tower(pixel_values, output_hidden_states=True)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\models\siglip\modeling_siglip.py", line 1190, in forward
return self.vision_model(
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\models\siglip\modeling_siglip.py", line 1089, in forward
hidden_states = self.embeddings(pixel_values, interpolate_pos_encoding=interpolate_pos_encoding)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, *kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\models\siglip\modeling_siglip.py", line 311, in forward
patch_embeds = self.patch_embedding(pixel_values) # shape = [, width, grid, grid]
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 554, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 549, in _conv_forward
return F.conv2d(
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

kimyoungjune

NCSOFT org Dec 2, 2024

Thank you for your interest! :)

The issue seems to have occurred due to a mismatch between the input type and the weight type.
To address this, I have updated the 'Direct Use' section of the README file as follows (with surrounding code omitted for brevity).

import torch
...
inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(device, torch.float16)
...

chx2k

Dec 2, 2024

This worked! Thank you.

kimyoungjune changed discussion status to closed Dec 6, 2024