Could u please provide more details about model training?

by andrew-more - opened Oct 10

Discussion

andrew-more

Oct 10

like train dataset, loss funciton, etc.

neofung

Owner Oct 10

like train dataset, loss funciton, etc.

actually, the dataset are also from bge, https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding#fine-tune , and you can find the sample on https://huggingface.co/datasets/Shitao/bge-reranker-data .

and trainer is also a llm implement of bge reranker, you can find loss function on https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/reranker/modeling.py

andrew-more

Oct 18

•

edited Oct 18

like train dataset, loss funciton, etc.

actually, the dataset are also from bge, https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding#fine-tune , and you can find the sample on https://huggingface.co/datasets/Shitao/bge-reranker-data .

and trainer is also a llm implement of bge reranker, you can find loss function on https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/reranker/modeling.py

Thanks for your reply.
We tried applying llm for the task of search relevance judgement, but the performance is quite poor, our experiment settings is :

Dataset: Chinese search relevance open source dataset, https://modelscope.cn/datasets/iic/QBQTC/summary
backbone: qwen2 1.5b model as backbone
finetune: full-parameter finetuned
epoch: 1
model structure: adding a classification head on the top of last token's hidden states.

Does the settings above same to your implementation?

We tested our code on sentence classification task, it worked well. But on sentence pair classification, the performance deteriorated.

It seems that using Encoder-Only structure(Bert-Like) performs much better than Decoder-Only structure, though we add classification head on the last token.

May you provide some clues to help us figure out this problem?

My Sincerely

neofung

Owner Oct 21

like train dataset, loss funciton, etc.

actually, the dataset are also from bge, https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding#fine-tune , and you can find the sample on https://huggingface.co/datasets/Shitao/bge-reranker-data .

and trainer is also a llm implement of bge reranker, you can find loss function on https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/reranker/modeling.py

Thanks for your reply.
We tried applying llm for the task of search relevance judgement, but the performance is quite poor, our experiment settings is :

Dataset: Chinese search relevance open source dataset, https://modelscope.cn/datasets/iic/QBQTC/summary

backbone: qwen2 1.5b model as backbone

finetune: full-parameter finetuned

epoch: 1

model structure: adding a classification head on the top of last token's hidden states.

Does the settings above same to your implementation?

We tested our code on sentence classification task, it worked well. But on sentence pair classification, the performance deteriorated.

It seems that using Encoder-Only structure(Bert-Like) performs much better than Decoder-Only structure, though we add classification head on the last token.

May you provide some clues to help us figure out this problem?

My Sincerely

Actually, my model is not a classification, it is a regression model, https://huggingface.co/neofung/LdIR-Qwen2-reranker-1.5B/discussions/4#67160e1224552afb830b1dd8 .

And please double confirm that, padding is correct? the last token is the \n(198) ?

last five tokens: <|im_end|>(151645), \n(198), <|im_start|>(151644), assistant(77091), \n(198)

If you applied incorrect padding strategy, or truncated the context by max model length, you may get the wrong last token's hidden states.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment