Text Generation
GGUF
English
LoneStriker commited on
Commit
41784da
1 Parent(s): 93500fe

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -1,35 +1,5 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ Mistral7B-PairRM-SPPO-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
2
+ Mistral7B-PairRM-SPPO-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
3
+ Mistral7B-PairRM-SPPO-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
4
+ Mistral7B-PairRM-SPPO-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
5
+ Mistral7B-PairRM-SPPO-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Mistral7B-PairRM-SPPO-Q3_K_L.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:100cd3b281e616770059a4c75c072eb535bdff21829d3a41cce37676615888d7
3
+ size 3822025056
Mistral7B-PairRM-SPPO-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d0c1a90224c242686e13d880cae83c5acb552893b83890e41eee18365db9647
3
+ size 4368439648
Mistral7B-PairRM-SPPO-Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:64e959a1353c81ab798d0e3d3d639803d763e9a810d562adf31a8b6547759e8b
3
+ size 5131409760
Mistral7B-PairRM-SPPO-Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc7ce9b26dbe9a52a032c6363d368e8cdfb765cf74ea0287c86142c0c5831e13
3
+ size 5942065504
Mistral7B-PairRM-SPPO-Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7705d37ea6c1e5f93fd99ec919c1d7af08c060fa980e98a1109b38de08b812b
3
+ size 7695858016
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - openbmb/UltraFeedback
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ ---
9
+ Self-Play Preference Optimization for Language Model Alignment (https://arxiv.org/abs/2405.00675)
10
+
11
+ # Mistral7B-PairRM-SPPO
12
+
13
+ This model was developed using [Self-Play Preference Optimization](https://arxiv.org/abs/2405.00675) at iteration 3, based on the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) architecture as starting point. We utilized the prompt sets from the [openbmb/UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset, splited to 3 parts for 3 iterations by [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset). All responses used are synthetic.
14
+
15
+ While K = 5 (generate 5 samples per iteration), this model uses 3 samples to estimate the soft probabilities P(y_w > y_l) and P(y_l > y_w). These samples include the winner, the loser, and another random sample. This approach has shown to deliver better performance on AlpacaEval 2.0 than the results reported in [the paper](https://arxiv.org/abs/2405.00675), but it might also lead to overfitting the PairRM core.
16
+
17
+ ❗Please refer to the original checkpoint at [**UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3**](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3) as **reported in our paper**. We anticipate that the version in paper demonstrates a more consistent performance improvement across all benchmark tasks.
18
+
19
+
20
+ ## Links to Other Models
21
+ - [Mistral7B-PairRM-SPPO-Iter1](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter1)
22
+ - [Mistral7B-PairRM-SPPO-Iter2](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter2)
23
+ - [Mistral7B-PairRM-SPPO-Iter3](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO-Iter3)
24
+ - [Mistral7B-PairRM-SPPO](https://huggingface.co/UCLA-AGI/Mistral7B-PairRM-SPPO)
25
+
26
+
27
+ ### Model Description
28
+
29
+ - Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
30
+ - Language(s) (NLP): Primarily English
31
+ - License: Apache-2.0
32
+ - Finetuned from model: mistralai/Mistral-7B-Instruct-v0.2
33
+
34
+
35
+ ## [AlpacaEval Leaderboard Evaluation Results](https://tatsu-lab.github.io/alpaca_eval/)
36
+ Model | LC. Win Rate | Win Rate | Avg. Length |
37
+ |------|-----------------------|---------------------------|------------|
38
+ Mistral7B-PairRM-SPPO| 30.46 | 32.14 | 2114 |
39
+ Mistral7B-PairRM-SPPO (best-of-16)| 32.90 | 34.67 | 2112 |
40
+
41
+
42
+ ### Training hyperparameters
43
+ The following hyperparameters were used during training:
44
+
45
+ - learning_rate: 5e-07
46
+ - eta: 1000
47
+ - per_device_train_batch_size: 8
48
+ - gradient_accumulation_steps: 1
49
+ - seed: 42
50
+ - distributed_type: deepspeed_zero3
51
+ - num_devices: 8
52
+ - optimizer: RMSProp
53
+ - lr_scheduler_type: linear
54
+ - lr_scheduler_warmup_ratio: 0.1
55
+ - num_train_epochs: 18.0 (stop at epoch=1.0)
56
+
57
+
58
+
59
+
60
+ ## Citation
61
+ ```
62
+ @misc{wu2024self,
63
+ title={Self-Play Preference Optimization for Language Model Alignment},
64
+ author={Wu, Yue and Sun, Zhiqing and Yuan, Huizhuo and Ji, Kaixuan and Yang, Yiming and Gu, Quanquan},
65
+ year={2024},
66
+ eprint={2405.00675},
67
+ archivePrefix={arXiv},
68
+ primaryClass={cs.LG}
69
+ }
70
+ ```