Cross-lingual Transfer of Reward Models in Multilingual Alignment Paper • 2410.18027 • Published Oct 23
Cross-lingual Transfer of Reward Models Collection This is the collection of synthetic preference data and trained reward models in "Cross-lingual Transfer of Reward Models in Multilingual Alignment". • 5 items • Updated Oct 31
iqwiki-kor/uf-g4o_translated-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed8049 Viewer • Updated Oct 30 • 56.8k • 33
iqwiki-kor/khs-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed6247 Viewer • Updated Oct 29 • 10.2k • 34
iqwiki-kor/khs-Qwen2.5-7B-distill-SFT-DPO-beta0.1-seed1903 Viewer • Updated Oct 29 • 10.2k • 33
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.1-op-samp4-seed6247 Viewer • Updated Oct 29 • 10.2k • 33
iqwiki-kor/Qwen2.5-7B-distill-SFT-DPO-beta0.1-op-samp4-seed42 Viewer • Updated Oct 29 • 10.2k • 29