About Temporal Positional Encoding
Is there an error in this code?
Maybe we should change pos_angle[:, 1::2] = torch.cos(pos_angle[:, 0::2])
to pos_angle[:, 1::2] = torch.cos(pos_angle[:, 1::2])
?
def get_angle(self, position):
pos_angle = self.angle.reshape(1, -1).to(position.device) * position.reshape(-1, 1)
pos_angle[:, 0::2] = torch.sin(pos_angle[:, 0::2])
pos_angle[:, 1::2] = torch.cos(pos_angle[:, 0::2])
pos_angle = pos_angle.unsqueeze(1)
return pos_angle
This is a typo. While the result of ‘pos_angle’ is correct because the angles of the odd and even positions of hidden_dim are equal.
modeling_kangaroo.py#L1080
self.angle = torch.stack([1 / torch.pow(torch.tensor(10000), torch.tensor(2 * (hid_j // 2) / hidden_dim)) for hid_j in range(hidden_dim)])
This is a typo. While the result of ‘pos_angle’ is correct because the angles of the odd and even positions of hidden_dim are equal.
modeling_kangaroo.py#L1080
self.angle = torch.stack([1 / torch.pow(torch.tensor(10000), torch.tensor(2 * (hid_j // 2) / hidden_dim)) for hid_j in range(hidden_dim)])
I know that. But When calculating the pos_angle[:, 1::2]
, pos_angle[:, 0::2]
has already changed in pos_angle[:, 0::2] = torch.sin(pos_angle[:, 0::2])