LLM_Alignment - a lole25 Collection

lole25 's Collections

LLM_Alignment

updated Jun 1

iREPO: implicit Reward Pairwise Difference based Empirical Preference Optimization

Paper • 2405.15230 • Published May 24 • 3