RuntimeError: shape '[1, 60, 64, 128]' is invalid for input of size 61440
I have been trying to use the example, so far I have ended up with the following error
File ~/anaconda3/envs/triton/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:261 in forward
key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 60, 64, 128]' is invalid for input of size 61440
The issue is generally with the transformers version. You will need transformers>=4.31.0 to make this work.
Thanks. Seemed to be the problem
How to slove it
How to slove it
The issue is generally with the transformers version. You will need transformers>=4.31.0 to make this work.
I upgrade transformer 4.31.0 ,but didn't slove
and one strange problem , 7b or 13b can work ,but 70B failed
have the same issue with the 70B version of models
You also need python>=3.8 to address this issue.
Same issue (but on Llama-3-8B model)
python=3.9 and transformers==4.41.0 don't work :/
Any Solution ?