💡 DICE - a sail Collection

sail 's Collections

🔱 Sailor2 Language Models

🧬 RegMix: Data Mixture as Regression

📈 Scaling Laws with Vocabulary

⚓️ Sailor Language Models

💡 DICE

updated Jul 28, 2024

Self-alignment with DPO Implicit Rewards

Bootstrapping Language Models with DPO Implicit Rewards

Paper • 2406.09760 • Published Jun 14, 2024 • 39
sail/Llama-3-Base-8B-DICE-Iter1

Text Generation • Updated 3 days ago • 23 • 2
sail/Llama-3-Base-8B-DICE-Iter2

Text Generation • Updated 3 days ago • 27 • 3
sail/Zephyr-7B-DICE-Iter1

Text Generation • Updated 3 days ago • 24
sail/Zephyr-7B-DICE-Iter2

Text Generation • Updated 3 days ago • 20 • 1