Issue in mla.py
#1
by
NingXu24
- opened
In the mla.py file:
d_model: total size of the model
So i think that d_model is hidden_dimention, but in MLA head_dimension is not equal to hidden_dimention / number_of_heads, because head_dimension is greater than hidden_dimention / number_of_heads, we can get this from DeepSeek V3 详细解读:模型&Infra 建设. But in loc107, we just use above equation to get the head_dimension.