Run SFT using PEFT got error with RotaryEmbedding

#34
by Andcircle - opened

Trying to run SFT using PEFT as here https://gist.github.com/pacman100/1731b41f7a90a87b457e8c5415ff1c14

RotaryEmbedding give error:
return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin)
TypeError: unsupported operand type(s) for *: 'Tensor' and 'NoneType'

Changed this line https://huggingface.co/tiiuae/falcon-7b/blob/2f5c3cd4eace6be6c0f12981f377fb35e5bf6ee5/modelling_RW.py#L73 to:
if seq_len != self.seq_len_cached or self.cos_cached is None or self.sin_cached is None:
then it works.

But I can't find any places where self.sin_cached has been set to None, any hints? Thanks

I use a cluster with 4 a10g,
CUDA version 12.0
torch version 2.0.1-cu118

Sign up or log in to comment