File size: 3,029 Bytes
f736bea
 
 
 
 
 
839febb
d6b0951
f740497
839febb
 
 
 
 
 
 
129cc75
839febb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f740497
f7f7602
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
---
license: apache-2.0
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---

A preview version of FuseChat-3.0, under testing...
## Training configs
```yaml
# Model arguments
model_name_or_path: AALF/FuseChat-Llama-3.1-8B-SFT
torch_dtype: null
attn_implementation: flash_attention_2


# Data training arguments
dataset_mixer:  FuseChat-Mixture-v3-DPO
dataset_splits:
- train
- test
preprocessing_num_workers: 12

# DPOTrainer arguments
bf16: true
beta: 10
avg_logp: true
gradient_accumulation_steps: 8
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: False
hub_model_id: wrpo-models
learning_rate: 8.0e-7
log_level: info
logging_steps: 5
lr_scheduler_type: cosine
max_length: 2048
max_prompt_length: 1800
num_train_epochs: 1
optim: adamw_torch
output_dir: outputs/FuseChat-Llama-3.1-8B-Instruct
run_name: FuseChat-Llama-3.1-8B-Instruct
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
push_to_hub: false
save_strategy: "steps"
save_steps: 101
save_total_limit: 20
seed: 42
warmup_ratio: 0.1
save_only_model: true
```

## Evaluation Results
| Datasets                        | Llama3.1-8B-Instruct | FuseChat-Llama-3.1-8B-SFT | FuseChat-Llama-3.1-8B-Instruct |
|---------------------------------|----------------------|---------------------------|--------------------------------|
| AlpacaEval-2 (LC/WR)            | 28.3/28.7             | 41.3/37.7                  | 65.4/63.3                       |
| Arena-Hard (WR/SC)              | 28.1/23.8             | 38.7/29                    | 58.2/46.4                       |
| MT-Bench                        | 8.38                  | 8.54                       | 9                              |
| AlignBench v1.1                 | 4.61                  | 6.25                       | 6.69                           |
| LiveBench 0831                  | 27.6                  | 30.2                       | 32                             |
| GSM8K                           | 85.9                  | 87                         | 88                             |
| MATH                            | 50.7                  | 54.7                       | 55.2                           |
| AMC 23                          | 25                    | 30                         | 37.5                           |
| MMLU-Pro                        | 50                    | 47.8                       | 49.2                           |
| MMLU-redux                      | 67.2                  | 68.4                       | 69.2                           |
| GPQA-Diamond                    | 33.8                  | 37.9                       | 34.9                           |
| HumanEval                       | 69.5                  | 69.5                       | 71.3                           |
| MBPP                            | 75.4                  | 71.4                       | 72                             |
| LiveCodeBench 2408-2411 (all/esay) | 12.3/40.5          | 12.6/39                    | 13.1/43.2                       |