Building Math Agents with Multi-Turn Iterative Preference Learning Paper • 2409.02392 • Published Sep 4, 2024 • 14
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 41
Comparing DPO with IPO and KTO Collection A collection of chat models to explore the differences between three alignment techniques: DPO, IPO, and KTO. • 56 items • Updated Jan 9, 2024 • 32
UDOP Collection UDOP is a general multimodal model for document AI • 4 items • Updated Jul 11, 2024 • 23