Datasets used in the fine-tuning stage

#16

by yushi - opened Jun 4, 2024

Jun 4, 2024

Hi authors, thanks for the great work! Can you release the scope of the data used in the fine-tuning stage? Specifically, which tasks in the MTEB benchmark is included in the training data?

thenlper

Alibaba-NLP org Jun 12, 2024

The data used in the fine-tuning stage is basically the same as that introduced in the paper[https://arxiv.org/abs/2308.03281]. GTE version v1.5 adds some synthetic data generated by LLM, which is not included in the MTEB benchmarks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment