Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Paper • 2410.22304 • Published Oct 29, 2024 • 17
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models Paper • 2403.07384 • Published Mar 12, 2024 • 1
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI Paper • 2410.11096 • Published Oct 14, 2024 • 12
Weak-to-Strong Extrapolation Expedites Alignment Collection Better aligned models obtained by weak-to-strong model extrapolation (ExPO) • 25 items • Updated 26 days ago • 17
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2, 2024 • 64