Issue in mla.py

#1
by NingXu24 - opened

In the mla.py file:

 d_model: total size of the model 

So i think that d_model is hidden_dimention, but in MLA head_dimension is not equal to hidden_dimention / number_of_heads, because head_dimension is greater than hidden_dimention / number_of_heads, we can get this from DeepSeek V3 详细解读:模型&Infra 建设. But in loc107, we just use above equation to get the head_dimension.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment