SELM-Llama
Collection
4 items
β’
Updated
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment.
This model is a fine-tuned version of ZhangShenao/SELM-Llama-3-8B-Instruct-iter-1 using synthetic data based on on the HuggingFaceH4/ultrafeedback_binarized dataset.
AlpacaEval 2.0 (LC WR) | MT-Bench (Average) | |
---|---|---|
SELM-Llama-3-8B-Instruct-iter-3 | β β ββ 33.47 | β β β 8.29 |
SELM-Llama-3-8B-Instruct-iter-2 | β β ββ 35.65 | β β β 8.09 |
SELM-Llama-3-8B-Instruct-iter-1 | β β ββ 32.02 | β β β 7.92 |
Meta-Llama-3-8B-Instruct | β β ββ 24.31 | β β β 7.93 |
The following hyperparameters were used during training:
Base model
meta-llama/Meta-Llama-3-8B-Instruct