RichardErkhov commited on
Commit
a5d02f8
1 Parent(s): e597b50

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +101 -0
README.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ Phi-3-mini-4K-instruct-cpo-simpo - AWQ
11
+ - Model creator: https://huggingface.co/Syed-Hasan-8503/
12
+ - Original model: https://huggingface.co/Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: apache-2.0
20
+ ---
21
+
22
+ # Phi-3-mini-4K-instruct with CPO-SimPO
23
+
24
+ This repository contains the Phi-3-mini-128K-instruct model enhanced with the CPO-SimPO technique. CPO-SimPO combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO).
25
+
26
+ ## Introduction
27
+
28
+ Phi-3-mini-4K-instruct is a model optimized for instruction-based tasks. This approach has demonstrated notable improvements in key benchmarks, pushing the boundaries of AI preference learning.
29
+
30
+ ### What is CPO-SimPO?
31
+
32
+ CPO-SimPO is a novel technique, which combines elements from CPO and SimPO:
33
+
34
+ - **Contrastive Preference Optimization (CPO):** Adds a behavior cloning regularizer to ensure the model remains close to the preferred data distribution.
35
+ - **Simple Preference Optimization (SimPO):** Incorporates length normalization and target reward margins to prevent the generation of long but low-quality sequences.
36
+
37
+ ### Github
38
+
39
+ **[CPO-SIMPO](https://github.com/fe1ixxu/CPO_SIMPO)**
40
+
41
+
42
+ ## Model Performance
43
+
44
+ COMING SOON!
45
+
46
+ ### Key Improvements:
47
+ - **Enhanced Model Performance:** Significant score improvements, particularly in GSM8K (up by 8.49 points!) and TruthfulQA (up by 2.07 points).
48
+ - **Quality Control:** Improved generation of high-quality sequences through length normalization and reward margins.
49
+ - **Balanced Optimization:** The BC regularizer helps maintain the integrity of learned preferences without deviating from the preferred data distribution.
50
+
51
+ ## Usage
52
+
53
+ ### Installation
54
+
55
+ To use this model, you need to install the `transformers` library from Hugging Face.
56
+
57
+ ```bash
58
+ pip install transformers
59
+ ```
60
+
61
+ ### Inference
62
+
63
+ Here's an example of how to perform inference with the model:
64
+
65
+ ```python
66
+ import torch
67
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
68
+
69
+ torch.random.manual_seed(0)
70
+
71
+ model = AutoModelForCausalLM.from_pretrained(
72
+ "Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo",
73
+ device_map="cuda",
74
+ torch_dtype="auto",
75
+ trust_remote_code=True,
76
+ )
77
+ tokenizer = AutoTokenizer.from_pretrained("Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo")
78
+
79
+ messages = [
80
+ {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
81
+ {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
82
+ {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
83
+ ]
84
+
85
+ pipe = pipeline(
86
+ "text-generation",
87
+ model=model,
88
+ tokenizer=tokenizer,
89
+ )
90
+
91
+ generation_args = {
92
+ "max_new_tokens": 500,
93
+ "return_full_text": False,
94
+ "temperature": 0.0,
95
+ "do_sample": False,
96
+ }
97
+
98
+ output = pipe(messages, **generation_args)
99
+ print(output[0]['generated_text'])
100
+ ```
101
+