e5-large-v2 requirements for training in non english?

#3
by wilfoderek - opened

Friends congrulations for the amazing work! My name is Wilfredo and i would like to training this model for non english so what are the further modification that must be done to get that goal?
And could you please describe the hardware need to get this model done?

Hi @wilfoderek , thanks for your interest.

The vocabulary of this model is mostly English, so you need to change it to a multilingual model (e.g., multilingual-bert / xlm-roberta). Also, you need to curate a collection of multilingual datasets for training.

We have released a multilingual model at https://huggingface.co/intfloat/multilingual-e5-base , which you may want to check out.

For hardware requirements, as described in our paper, the large-size model requires 64 V100 GPUs for roughly 4 days.

Thank you for your soon answer

Sign up or log in to comment