|
--- |
|
license: apache-2.0 |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- SimPo |
|
language: |
|
- en |
|
base_model: |
|
- mistralai/Mistral-7B-Instruct-v0.2 |
|
--- |
|
This is a model released from the preprint: *[SimPO: Simple Preference Optimization with a Reference-Free Reward](https://arxiv.org/abs/2405.14734)*, which is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. |
|
SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. |
|
Please refer to our [github repo](https://github.com/princeton-nlp/SimPO) for more details. |