When running example got ValueError: Attention mask should be of size (1, 1, 1, 30), but is torch.Size([1, 1, 1, 29])
When running the given example at environment: transformers==4.48.2:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "moonshotai/Moonlight-16B-A3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
messages = [
{"role": "system", "content": "You are a helpful assistant provided by Moonshot-AI."},
{"role": "user", "content": "Is 123 a prime?"}
]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
generated_ids = model.generate(inputs=input_ids, max_new_tokens=500)
response = tokenizer.batch_decode(generated_ids)[0]
print(response)
I got the error:
ValueError: Attention mask should be of size (1, 1, 1, 30), but is torch.Size([1, 1, 1, 29])
Seems something wrong with the prepare_inputs_for_attention I guess?
@Phando I compared modeling_deepseek.py to the one in ds v3 repo, and it seems changing line 1656: max_cache_length = past_key_values.get_seq_length() to max_cache_length = past_key_values.get_max_length() solves the issue for me. Not sure why moonshot modified this though.
@toothacher17 Can you help check this?
I can reproduce the error:
File ~/.cache/huggingface/modules/transformers_modules/moonshotai/Moonlight-16B-A3B-Instruct/930af5a36e0ed715651ee8fb0caccc8bc2d613b5/modeling_deepseek.py:829, in DeepseekV3Attention.forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache, **kwargs)
827 if attention_mask is not None:
828 if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
--> 829 raise ValueError(
830 f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}"
831 )
832 attn_weights = attn_weights + attention_mask
834 # upcast attention to fp32
ValueError: Attention mask should be of size (1, 1, 1, 30), but is torch.Size([1, 1, 1, 29])
I can reproduce the error:
File ~/.cache/huggingface/modules/transformers_modules/moonshotai/Moonlight-16B-A3B-Instruct/930af5a36e0ed715651ee8fb0caccc8bc2d613b5/modeling_deepseek.py:829, in DeepseekV3Attention.forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache, **kwargs) 827 if attention_mask is not None: 828 if attention_mask.size() != (bsz, 1, q_len, kv_seq_len): --> 829 raise ValueError( 830 f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}" 831 ) 832 attn_weights = attn_weights + attention_mask 834 # upcast attention to fp32 ValueError: Attention mask should be of size (1, 1, 1, 30), but is torch.Size([1, 1, 1, 29])
Could you help check if my change fix it?
https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct/discussions/10
We're sorry for the mistakes in recent update. The code has been updated and please have another try. Let us know if there is still any problem.