Precisions about the config properties wrt the paper

#5
by TomSchelsen - opened

In https://huggingface.co/answerdotai/ModernBERT-base/blob/main/config.json , we see "hidden_activation": "gelu" and "position_embedding_type": "absolute" (even though rope related settings do appear in the config as well), whereas the paper says that GeGLU and RoPE are used respectively. Is it expected and a strangeness coming from the transformers library itself or is it a misconfig/export ? Thanks

Answer.AI org

As we mention in the paper, GeGLU is GLU with GeLU instead of sigmoid. "hidden_activation": "gelu" is correct.

We adopt GeGLU (Shazeer, 2020), a Gated-Linear Units (GLU)-based (Dauphin et al., 2017) activation function built on top of the original BERT’s GeLU.

I believe position_embedding_type is a default config argument in transformers. ModernBERT doesn't use it, I'll have to check if we can remove it from the config.

Sign up or log in to comment