MRL and linear layers

#19
by bobox - opened

First of all, congratulations and thank you for the good model!

I have just a few questions...

If I understand Matryoshka Representations Learning correctly, then I should ask: What is the need for the linear layers on top of the model? For example, regarding the layer that reduces dimensionality from the 1536 dimensions of the model embeddings to 256, wouldn't MRL allow you to just truncate the original dimensions to the required size?

also, did you notice a relevant improvement in accuracy just scaling up dimensions with a linear layer (like the layers that scale to 2048 or 8192?
what is the difference between "2_dense" and "2_dense_8192"?

Also, out of curiosity, why did you choose Qwen 1.5 over DeBERTa-v2-xxl (1.5B)?

thanks in advance!

StellaEncoder org

Hi bobox,

What is the need for the linear layers on top of the model? Wouldn't MRL allow you to just truncate the original dimensions to the required size?

In our experiments, adding linear layers on top of model has a better performance than truncating the original dimensions to the required size

did you notice a relevant improvement in accuracy just scaling up dimensions with a linear layer

Yes, it has a relevant improvement.

what is the difference between "2_dense" and "2_dense_8192"?

"2_dense" and "2_dense_8192" are the same thing.

Also, out of curiosity, why did you choose Qwen 1.5 over DeBERTa-v2-xxl (1.5B)?

I haven't paid attention to this model lately, so if it performs well in your taks, let me know and I'll try it later!

Sign up or log in to comment