How did you train m3-retromae?

#66

by hotchpotch - opened Jun 24, 2024

Jun 24, 2024

Hello bge-m3 Great approach! It performs well and I love it. Thank you for the great models and papers published.

I would like to know about the 8192 tokens support of XLM-Roberta, as I could not read it from the paper.
Is it correct that you first set the max_position_embeddings of XLM-Roberta to 8194 and then created a bge-m3-retromae trained with long token sentences in RetroMAE?
I would also appreciate if you could tell me what training dataset you used at that time, if possible.

Shitao

Beijing Academy of Artificial Intelligence org Jun 25, 2024

Thanks for your attention to our work!
We extend the max_position_embeddings of XLM-Roberta to 8194 and train this model on pile, mc4, and wudao datasets with retromae loss.
For the details of pre-training, you can refer to Appendix.B.1 in our paper.

hotchpotch

Jun 25, 2024

Thank you!

I have also read Appendix.B.1, which deepened my understanding. I'm very grateful.

aarabil

Aug 1, 2024

How did you extent the positional embeddings to 8192 exactly? Did you randomly initialize the new embeddings past 512? Or use some interpolation technique based on the original pretrained positional embedding?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment