Zephyr ORPO - a HuggingFaceH4 Collection

HuggingFaceH4 's Collections

Scaling Test-Time Compute with Open Models

Zephyr 7B Gemma

Papers We've Read

Awesome SFT datasets

Awesome feedback datasets

Awesome reward models

Zephyr ORPO

updated Apr 12, 2024

Models and datasets to align LLMs with Odds Ratio Preference Optimisation (ORPO). Recipes here: https://github.com/huggingface/alignment-handbook

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12, 2024 • 64
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Text Generation • Updated Apr 18, 2024 • 261 • 266
argilla/distilabel-capybara-dpo-7k-binarized

Viewer • Updated Jul 16, 2024 • 7.56k • 2.62k • 181