File size: 2,710 Bytes
a137e28
 
280d697
a137e28
de83e40
a137e28
de83e40
280d697
de83e40
280d697
a137e28
280d697
a137e28
 
 
280d697
 
a137e28
 
 
 
280d697
ec62026
280d697
 
 
 
 
 
 
 
 
 
a137e28
 
 
 
 
44318e4
280d697
 
 
a137e28
280d697
a137e28
 
280d697
 
 
 
 
 
 
 
b23abb1
a137e28
 
 
 
280d697
 
a137e28
 
 
 
 
 
 
 
 
 
 
 
280d697
a137e28
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: mit
base_model: ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2
tags:
- alignment-handbook
- dpo
- trl
- selm
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: SELM-Phi-3-mini-4k-instruct-iter-3
  results: []
---



<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->



[Self-Exploring Language Models: Active Preference Elicitation for Online Alignment](https://arxiv.org/abs/2405.19332).



# SELM-Phi-3-mini-4k-instruct-iter-3



This model is a fine-tuned version of [ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2) using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.



## Model description



- Model type: A 3.8B parameter Phi3-instruct-based Self-Exploring Language Models (SELM).
- License: MIT



## Results



|                                        | AlpacaEval 2.0 (LC WR) | MT-Bench (Average) |
|----------------------------------------|------------------------|--------------------|
| [SELM-Phi-3-mini-4k-instruct-iter-3](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-3)  |    &emsp; &emsp; &emsp;&emsp;           27.98          |   &emsp; &emsp; &emsp;         8.32       |
| [SELM-Phi-3-mini-4k-instruct-iter-2](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2)  |    &emsp; &emsp; &emsp;&emsp;           26.79          |   &emsp; &emsp; &emsp;         8.44       |
| [SELM-Phi-3-mini-4k-instruct-iter-1](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-1)  |    &emsp; &emsp; &emsp;&emsp;           27.33          |   &emsp; &emsp; &emsp;         8.37       |
| [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)  |    &emsp; &emsp; &emsp;&emsp;         23.05         |  &emsp; &emsp; &emsp;         8.12       |

Our model also ranks highly on [WildBench](https://huggingface.co/spaces/allenai/WildBench)! 🔥

### Training hyperparameters

The following hyperparameters were used during training:
- alpha: 0.001
- beta: 0.01
- train_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 1

### Framework versions

- Transformers 4.40.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1