Datasets used in the fine-tuning stage

#16
by yushi - opened

Hi authors, thanks for the great work! Can you release the scope of the data used in the fine-tuning stage? Specifically, which tasks in the MTEB benchmark is included in the training data?

Alibaba-NLP org

The data used in the fine-tuning stage is basically the same as that introduced in the paper[https://arxiv.org/abs/2308.03281]. GTE version v1.5 adds some synthetic data generated by LLM, which is not included in the MTEB benchmarks.

Sign up or log in to comment