README.md · princeton-nlp/Mistral-7B-Instruct-SimPO at refs/pr/4

metadata

license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
  - SimPo
language:
  - en
base_model:
  - mistralai/Mistral-7B-Instruct-v0.2

This is a model released from the preprint: SimPO: Simple Preference Optimization with a Reference-Free Reward, which is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets. SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance. Please refer to our github repo for more details.