Phi-3-mini-128K-instruct with CPO-SimPO
This repository contains the Phi-3-mini-128K-instruct model enhanced with the CPO-SimPO technique. CPO-SimPO combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO).
Introduction
Phi-3-mini-128K-instruct is a model optimized for instruction-based tasks. This approach has demonstrated notable improvements in key benchmarks, pushing the boundaries of AI preference learning.
What is CPO-SimPO?
CPO-SimPO is a novel technique, which combines elements from CPO and SimPO:
- Contrastive Preference Optimization (CPO): Adds a behavior cloning regularizer to ensure the model remains close to the preferred data distribution.
- Simple Preference Optimization (SimPO): Incorporates length normalization and target reward margins to prevent the generation of long but low-quality sequences.
Github
Model Performance
Base Scores:
- MMLU: 68.7
- HellaSwag: 80.09
- GSM8K: 69.52
- ARC: 63.14
- Winogrande: 72.85
- TruthfulQA: 54.12
New Scores after CPO-SimPO:
- MMLU: 68.79
- HellaSwag: 80.78
- GSM8K: 78.01
- ARC: 62.97
- Winogrande: 74.47
- TruthfulQA: 56.19
Key Improvements:
- Enhanced Model Performance: Significant score improvements, particularly in GSM8K (up by 8.49 points!) and TruthfulQA (up by 2.07 points).
- Quality Control: Improved generation of high-quality sequences through length normalization and reward margins.
- Balanced Optimization: The BC regularizer helps maintain the integrity of learned preferences without deviating from the preferred data distribution.
Usage
Installation
To use this model, you need to install the transformers
library from Hugging Face.
pip install transformers
Inference
Here's an example of how to perform inference with the model:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
torch.random.manual_seed(0)
model = AutoModelForCausalLM.from_pretrained(
"QueryloopAI/Phi-3-mini-128K-instruct-cpo-simpo",
device_map="cuda",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("Syed-Hasan-8503/Phi-3-mini-128K-instruct-cpo-simpo")
messages = [
{"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
{"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
generation_args = {
"max_new_tokens": 500,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.