Is attention mask wrong for batch generation?

#33
by qingsonglv - opened
Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

For batch generation, the attention_mask is set to a single 1 referring to this line: https://huggingface.co/THUDM/chatglm-6b/blob/main/modeling_chatglm.py#L948

However, for a batch with various lengths, the left padded tokens are not masked in this case.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

position id has the same problem I guess.

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org
edited Apr 10, 2023
Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

seems like my fault... there's no bug

qingsonglv changed discussion status to closed

Sign up or log in to comment