Post
692
We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀
🤗paper: Self-Play Preference Optimization for Language Model Alignment (2405.00675)
⭐ code: https://github.com/uclaml/SPPO
🤗models: UCLA-AGI/sppo-6635fdd844f2b2e4a94d0b9a
🤗paper: Self-Play Preference Optimization for Language Model Alignment (2405.00675)
⭐ code: https://github.com/uclaml/SPPO
🤗models: UCLA-AGI/sppo-6635fdd844f2b2e4a94d0b9a