Papers
arxiv:2412.17364

Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp)

Published on Dec 23, 2024
Authors:

Abstract

Text embedding models play a crucial role in natural language processing, particularly in information retrieval, and their importance is further highlighted with the recent utilization of RAG (Retrieval- Augmented Generation). This study presents an efficient fine-tuning methodology encompassing data selection, loss function, and model architecture to enhance the information retrieval performance of pre-trained text embedding models. In particular, this study proposes a novel <PRE_TAG>Contrastive Learning Penalty</POST_TAG> function that overcomes the limitations of existing Contrastive Learning. The proposed methodology achieves significant performance improvements over existing methods in document retrieval tasks. This study is expected to contribute to improving the performance of information retrieval systems through fine-tuning of text embedding models. The code for this study can be found at https://github.com/CreaLabs/Enhanced-BGE-M3-with-CLP-and-MoE, and the best-performing model can be found at https://huggingface.co/CreaLabs.

Community

Sign up or log in to comment

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.17364 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.17364 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.