How did you train m3-retromae?

#66
by hotchpotch - opened

Hello bge-m3 Great approach! It performs well and I love it. Thank you for the great models and papers published.

I would like to know about the 8192 tokens support of XLM-Roberta, as I could not read it from the paper.
Is it correct that you first set the max_position_embeddings of XLM-Roberta to 8194 and then created a bge-m3-retromae trained with long token sentences in RetroMAE?
I would also appreciate if you could tell me what training dataset you used at that time, if possible.

Beijing Academy of Artificial Intelligence org

Thanks for your attention to our work!
We extend the max_position_embeddings of XLM-Roberta to 8194 and train this model on pile, mc4, and wudao datasets with retromae loss.
For the details of pre-training, you can refer to Appendix.B.1 in our paper.

Thank you!

I have also read Appendix.B.1, which deepened my understanding. I'm very grateful.

Sign up or log in to comment