Any plans to use RMSNorm (or FlashNorm) instead of LayerNorm?

#12
by graefics - opened

Llama and many other LLMs use RMSNorm. Any reason why you still use LayerNorm? Thanks

FlashNorm: https://arxiv.org/abs/2407.09577
RMSNorm: https://arxiv.org/abs/1910.07467

Yes.
I think LayerNorm have mean and variance of input but RMSNorm haven't it.
So many LLM use RMSNorm.
Of course you can use LayerNorm.

Sign up or log in to comment