Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper โข 2404.03715 โข Published Apr 4, 2024 โข 61