-
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 23 -
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Paper • 2406.02900 • Published • 10 -
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Paper • 2406.04151 • Published • 14 -
Understanding and Diagnosing Deep Reinforcement Learning
Paper • 2406.16979 • Published • 8
![Xie Yuquan's picture](https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/q0q7LctHeuD-qtTZOtOP7.jpeg)
Xie Yuquan
xieyuquan
·
AI & ML interests
LLM, multi-modal
Organizations
None yet
Collections
4
models
2
datasets
None public yet