DICE
Collection
Self-alignment with DPO Implicit Rewards
•
5 items
•
Updated
•
5
This is a model released from the preprint: Bootstrapping Language Models with DPO Implicit Rewards. Please refer to our repository for more details.