File size: 735 Bytes
2b0f4a3
8b79dc2
2b0f4a3
8b79dc2
 
 
 
 
 
 
2b0f4a3
8b79dc2
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- SimPo
language:
- en
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
---
This is a model released from the preprint: *[SimPO: Simple Preference Optimization with a Reference-Free Reward](https://arxiv.org/abs/2405.14734)*, which is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. 
SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. 
Please refer to our [github repo](https://github.com/princeton-nlp/SimPO) for more details.