Question About MLA Usage?

#1
by splendor1811 - opened

I’m a bit confused about the flow in which MLA is used. When you load the model using AutoModelForCausalLM, as shown in your README.md file, Hugging Face automatically uses the original LLaMA architecture to load the model (which would mean GQA instead of MLA). I’d like to ask: when you converted GQA to MLA, did you use the code processing from the paper's GitHub repository?

@splendor1811 Hello, The normal transformer library doesn't support it, I made this for the convert process https://github.com/bet0x/transmla-converter you can also check the code from the original paper https://github.com/fxmeng/TransMLA.

Sign up or log in to comment