Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models (https://arxiv.org/abs/2401.01335)

zephyr-7b-sft-full-spin-iter0

This model is a self-play fine-tuned model at iteration 0 from alignment-handbook/zephyr-7b-sft-full using synthetic data based on on the HuggingFaceH4/ultrachat_200k dataset.

Model Details

Model Description

  • Model type: A 7B parameter GPT-like model fine-tuned on synthetic datasets.
  • Language(s) (NLP): Primarily English
  • License: MIT
  • Finetuned from model: alignment-handbook/zephyr-7b-sft-full (based on mistralai/Mistral-7B-v0.1)

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • optimizer: RMSProp
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2.0

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 62.37
ARC (25-shot) 63.65
HellaSwag (10-shot) 84.44
MMLU (5-shot) 61.01
TruthfulQA (0-shot) 50.48
Winogrande (5-shot) 77.98
GSM8K (5-shot) 36.69

Citation

@misc{chen2024selfplay,
      title={Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models}, 
      author={Zixiang Chen and Yihe Deng and Huizhuo Yuan and Kaixuan Ji and Quanquan Gu},
      year={2024},
      eprint={2401.01335},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Downloads last month
839
Safetensors
Model size
7.24B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0

Spaces using UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0 9

Collection including UCLA-AGI/zephyr-7b-sft-full-SPIN-iter0