Any plans to use RMSNorm (or FlashNorm) instead of LayerNorm?

#12
by graefics - opened

Llama and many other LLMs use RMSNorm. Any reason why you still use LayerNorm? Thanks

FlashNorm: https://arxiv.org/abs/2407.09577
RMSNorm: https://arxiv.org/abs/1910.07467

Yes.
I think LayerNorm have mean and variance of input but RMSNorm haven't it.
So many LLM use RMSNorm.
Of course you can use LayerNorm.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment