Question About MLA Usage?
#1
by
splendor1811
- opened
I’m a bit confused about the flow in which MLA is used. When you load the model using AutoModelForCausalLM
, as shown in your README.md
file, Hugging Face automatically uses the original LLaMA architecture to load the model (which would mean GQA instead of MLA). I’d like to ask: when you converted GQA to MLA, did you use the code processing from the paper's GitHub repository?
@splendor1811 Hello, The normal transformer library doesn't support it, I made this for the convert process https://github.com/bet0x/transmla-converter you can also check the code from the original paper https://github.com/fxmeng/TransMLA.