SAIL-Sailor/sailor2_8B_sft_500step_hf_1102_longxu_dpo_417step_zichen Text Generation • Updated Nov 25, 2024 • 6
Locality Sensitive Sparse Encoding for Learning World Models Online Paper • 2401.13034 • Published Jan 23, 2024
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning Paper • 2402.03046 • Published Feb 5, 2024 • 6
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published Jun 14, 2024 • 39