Syed-Hasan-8503 commited on
Commit
2896ef3
1 Parent(s): 4c0f5ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -6
README.md CHANGED
@@ -2,13 +2,13 @@
2
  license: apache-2.0
3
  ---
4
 
5
- # Phi-3-mini-128K-instruct with CPO-SimPO
6
 
7
  This repository contains the Phi-3-mini-128K-instruct model enhanced with the CPO-SimPO technique. CPO-SimPO combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO).
8
 
9
  ## Introduction
10
 
11
- Phi-3-mini-128K-instruct is a model optimized for instruction-based tasks. This approach has demonstrated notable improvements in key benchmarks, pushing the boundaries of AI preference learning.
12
 
13
  ### What is CPO-SimPO?
14
 
@@ -26,8 +26,6 @@ CPO-SimPO is a novel technique, which combines elements from CPO and SimPO:
26
 
27
  COMING SOON!
28
 
29
- - **TruthfulQA:** 56.19
30
-
31
  ### Key Improvements:
32
  - **Enhanced Model Performance:** Significant score improvements, particularly in GSM8K (up by 8.49 points!) and TruthfulQA (up by 2.07 points).
33
  - **Quality Control:** Improved generation of high-quality sequences through length normalization and reward margins.
@@ -54,12 +52,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
54
  torch.random.manual_seed(0)
55
 
56
  model = AutoModelForCausalLM.from_pretrained(
57
- "Syed-Hasan-8503/Phi-3-mini-128K-instruct-cpo-simpo",
58
  device_map="cuda",
59
  torch_dtype="auto",
60
  trust_remote_code=True,
61
  )
62
- tokenizer = AutoTokenizer.from_pretrained("Syed-Hasan-8503/Phi-3-mini-128K-instruct-cpo-simpo")
63
 
64
  messages = [
65
  {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # Phi-3-mini-4K-instruct with CPO-SimPO
6
 
7
  This repository contains the Phi-3-mini-128K-instruct model enhanced with the CPO-SimPO technique. CPO-SimPO combines Contrastive Preference Optimization (CPO) and Simple Preference Optimization (SimPO).
8
 
9
  ## Introduction
10
 
11
+ Phi-3-mini-4K-instruct is a model optimized for instruction-based tasks. This approach has demonstrated notable improvements in key benchmarks, pushing the boundaries of AI preference learning.
12
 
13
  ### What is CPO-SimPO?
14
 
 
26
 
27
  COMING SOON!
28
 
 
 
29
  ### Key Improvements:
30
  - **Enhanced Model Performance:** Significant score improvements, particularly in GSM8K (up by 8.49 points!) and TruthfulQA (up by 2.07 points).
31
  - **Quality Control:** Improved generation of high-quality sequences through length normalization and reward margins.
 
52
  torch.random.manual_seed(0)
53
 
54
  model = AutoModelForCausalLM.from_pretrained(
55
+ "Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo",
56
  device_map="cuda",
57
  torch_dtype="auto",
58
  trust_remote_code=True,
59
  )
60
+ tokenizer = AutoTokenizer.from_pretrained("Syed-Hasan-8503/Phi-3-mini-4K-instruct-cpo-simpo")
61
 
62
  messages = [
63
  {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},