Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning Paper • 2410.22304 • Published 24 days ago • 15
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models Paper • 2403.07384 • Published Mar 12 • 1
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI Paper • 2410.11096 • Published Oct 14 • 12
Weak-to-Strong Extrapolation Expedites Alignment Collection Better aligned models obtained by weak-to-strong model extrapolation (ExPO) • 25 items • Updated 27 days ago • 16
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models Paper • 2401.01335 • Published Jan 2 • 64