File size: 2,710 Bytes
a137e28 280d697 a137e28 de83e40 a137e28 de83e40 280d697 de83e40 280d697 a137e28 280d697 a137e28 280d697 a137e28 280d697 ec62026 280d697 a137e28 44318e4 280d697 a137e28 280d697 a137e28 280d697 b23abb1 a137e28 280d697 a137e28 280d697 a137e28 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
---
license: mit
base_model: ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2
tags:
- alignment-handbook
- dpo
- trl
- selm
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: SELM-Phi-3-mini-4k-instruct-iter-3
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
[Self-Exploring Language Models: Active Preference Elicitation for Online Alignment](https://arxiv.org/abs/2405.19332).
# SELM-Phi-3-mini-4k-instruct-iter-3
This model is a fine-tuned version of [ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2) using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
## Model description
- Model type: A 3.8B parameter Phi3-instruct-based Self-Exploring Language Models (SELM).
- License: MIT
## Results
| | AlpacaEval 2.0 (LC WR) | MT-Bench (Average) |
|----------------------------------------|------------------------|--------------------|
| [SELM-Phi-3-mini-4k-instruct-iter-3](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-3) |        27.98 |       8.32 |
| [SELM-Phi-3-mini-4k-instruct-iter-2](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-2) |        26.79 |       8.44 |
| [SELM-Phi-3-mini-4k-instruct-iter-1](https://huggingface.co/ZhangShenao/SELM-Phi-3-mini-4k-instruct-iter-1) |        27.33 |       8.37 |
| [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) |        23.05 |       8.12 |
Our model also ranks highly on [WildBench](https://huggingface.co/spaces/allenai/WildBench)! 🔥
### Training hyperparameters
The following hyperparameters were used during training:
- alpha: 0.001
- beta: 0.01
- train_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 1
### Framework versions
- Transformers 4.40.2
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.19.1
|