From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline Paper • 2406.11939 • Published 12 days ago • 5
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Paper • 2310.11511 • Published Oct 17, 2023 • 65
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment Paper • 2310.00212 • Published Sep 30, 2023 • 2