princeton-nlp
/

Mistral-7B-Instruct-SimPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Mistral-7B-Instruct-SimPO / README.md

CombinHorizon's picture

Update README.md

8b79dc2 verified 4 months ago

|

735 Bytes

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- SimPo
	language:
	- en
	base_model:
	- mistralai/Mistral-7B-Instruct-v0.2
	---
	This is a model released from the preprint: [SimPO: Simple Preference Optimization with a Reference-Free Reward](https://arxiv.org/abs/2405.14734), which is an offline preference optimization algorithm designed to enhance the training of large language models (LLMs) with preference optimization datasets.
	SimPO aligns the reward function with the generation likelihood, eliminating the need for a reference model and incorporating a target reward margin to boost performance.
	Please refer to our [github repo](https://github.com/princeton-nlp/SimPO) for more details.