Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension Paper • 2412.03704 • Published Dec 4, 2024 • 6
Stabilizing RLHF through Advantage Model and Selective Rehearsal Paper • 2309.10202 • Published Sep 18, 2023 • 9
Collaborative decoding of critical tokens for boosting factuality of large language models Paper • 2402.17982 • Published Feb 28, 2024
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18, 2024 • 54
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching Paper • 2406.06326 • Published Jun 10, 2024 • 2
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning Paper • 2407.00617 • Published Jun 30, 2024 • 7
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning Paper • 2410.06508 • Published Oct 9, 2024 • 10
DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects Paper • 2410.02730 • Published Oct 3, 2024
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15, 2024 • 47
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning Paper • 2410.06508 • Published Oct 9, 2024 • 10
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning Paper • 2410.06508 • Published Oct 9, 2024 • 10
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning Paper • 2410.06508 • Published Oct 9, 2024 • 10
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows Paper • 2409.17433 • Published Sep 25, 2024 • 9
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28, 2024 • 96
Stabilizing RLHF through Advantage Model and Selective Rehearsal Paper • 2309.10202 • Published Sep 18, 2023 • 9
The Trickle-down Impact of Reward (In-)consistency on RLHF Paper • 2309.16155 • Published Sep 28, 2023