FuseAI
/

FuseChat-Llama-3.1-8B-Instruct

Model card Files Files and versions Community

FuseChat-Llama-3.1-8B-Instruct / README.md

AALF's picture

Update README.md

129cc75 verified about 2 months ago

|

3.03 kB

	---
	license: apache-2.0
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	---

	A preview version of FuseChat-3.0, under testing...
	## Training configs
	```yaml
	# Model arguments
	model_name_or_path: AALF/FuseChat-Llama-3.1-8B-SFT
	torch_dtype: null
	attn_implementation: flash_attention_2


	# Data training arguments
	dataset_mixer: FuseChat-Mixture-v3-DPO
	dataset_splits:
	- train
	- test
	preprocessing_num_workers: 12

	# DPOTrainer arguments
	bf16: true
	beta: 10
	avg_logp: true
	gradient_accumulation_steps: 8
	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: False
	hub_model_id: wrpo-models
	learning_rate: 8.0e-7
	log_level: info
	logging_steps: 5
	lr_scheduler_type: cosine
	max_length: 2048
	max_prompt_length: 1800
	num_train_epochs: 1
	optim: adamw_torch
	output_dir: outputs/FuseChat-Llama-3.1-8B-Instruct
	run_name: FuseChat-Llama-3.1-8B-Instruct
	per_device_train_batch_size: 2
	per_device_eval_batch_size: 4
	push_to_hub: false
	save_strategy: "steps"
	save_steps: 101
	save_total_limit: 20
	seed: 42
	warmup_ratio: 0.1
	save_only_model: true
	```

	## Evaluation Results
	\| Datasets \| Llama3.1-8B-Instruct \| FuseChat-Llama-3.1-8B-SFT \| FuseChat-Llama-3.1-8B-Instruct \|
	\|---------------------------------\|----------------------\|---------------------------\|--------------------------------\|
	\| AlpacaEval-2 (LC/WR) \| 28.3/28.7 \| 41.3/37.7 \| 65.4/63.3 \|
	\| Arena-Hard (WR/SC) \| 28.1/23.8 \| 38.7/29 \| 58.2/46.4 \|
	\| MT-Bench \| 8.38 \| 8.54 \| 9 \|
	\| AlignBench v1.1 \| 4.61 \| 6.25 \| 6.69 \|
	\| LiveBench 0831 \| 27.6 \| 30.2 \| 32 \|
	\| GSM8K \| 85.9 \| 87 \| 88 \|
	\| MATH \| 50.7 \| 54.7 \| 55.2 \|
	\| AMC 23 \| 25 \| 30 \| 37.5 \|
	\| MMLU-Pro \| 50 \| 47.8 \| 49.2 \|
	\| MMLU-redux \| 67.2 \| 68.4 \| 69.2 \|
	\| GPQA-Diamond \| 33.8 \| 37.9 \| 34.9 \|
	\| HumanEval \| 69.5 \| 69.5 \| 71.3 \|
	\| MBPP \| 75.4 \| 71.4 \| 72 \|
	\| LiveCodeBench 2408-2411 (all/esay) \| 12.3/40.5 \| 12.6/39 \| 13.1/43.2 \|