Text Generation
Transformers
Safetensors
English
deberta
reward_model
reward-model
RLHF
evaluation
llm
instruction
reranking
Inference Endpoints

I wonder if there is any possibility that you can make your collected pairwise dataset open-source ?

#4
by fakerbaby - opened

The 0.4B trained pairRM can present great power for ranking and comparing; however, practically, the model's ability can not generlized to some model's outputs. Based on recent papers, I believe big RMs trained on the same data can demonstrate more generalized ability.

fakerbaby changed discussion title from I wander if there is any possibility that you can make your collected pairwise dataset open-source ? to I wonder if there is any possibility that you can make your collected pairwise dataset open-source ?
LLM Blender org

Please check out processed Unified-Feedback

Sign up or log in to comment