Troubleshooting Tensor Dimension Mismatch

#13
by dankoyy - opened

Hello,

When I try to compile the example from the model card, I encounter a matrix error.

modeling_phi.py:
Code line 313: padding_mask.masked_fill_(key_padding_mask, 0.0)
RuntimeError: The size of tensor a (749) must match the size of tensor b (750) at non-singleton dimension 1.

The error indicates a tensor size mismatch during a computation in your model, specifically when tensors of size 749 and 750 are attempted to be processed together at a dimension where their sizes must match. Any ideas why this is happening? Thank you.

Operating system: Windows 11 Home
Operating system version: 10.0.22631
Python version: 3.12.2
PyTorch version: 2.2.1+cu121
Torchvision version: 2.2.1+cu121
CUDA version: 12.1
CUDNN version: 8801
Current CUDA device: 0
Number of CUDA devices available: 1
Name of current CUDA device: NVIDIA GeForce RTX 3070 Ti
Check if CUDA is available: True

image.png

dankoyy changed discussion title from Troubleshooting Tensor Dimension Mismatch in Deep Learning Models to Troubleshooting Tensor Dimension Mismatch

There was a backward incompatible change to the KV cache introduced in the 4.38.0 release of transformers. Three options:

  1. Use moondream2, where the issue is fixed.
  2. Downgrade transformers to 4.37.2.
  3. Try the patch mentioned here.

Sign up or log in to comment