File size: 3,806 Bytes
acb4410
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8d81f82
 
 
 
acb4410
8d81f82
 
 
acb4410
8d81f82
acb4410
8d81f82
 
acb4410
 
 
 
 
 
 
 
8d81f82
 
acb4410
8d81f82
acb4410
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
license: llama3.2
datasets:
- HuggingFaceH4/ultrafeedback_binarized
base_model:
- AIR-hl/Llama-3.2-1B-ultrachat200k
pipeline_tag: text-generation
tags:
- trl
- llama
- dpo
- alignment
- transformers
- custome
- chat
---
# Llama-3.2-1B-DPO


## Model Details

- **Model type:** aligned model
- **License:** llama3.2
- **Finetuned from model:** [AIR-hl/Llama-3.2-1B-ultrachat200k](https://huggingface.co/AIR-hl/Llama-3.2-1B-ultrachat200k)
- **Training data:** [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized)
- **Training framework:** [trl](https://github.com/huggingface/trl)

## Training Details

devices: 4 * NPU 910B-64GB \
precision: bf16 mixed-precision \
global_batch_size: 64

### Training Hyperparameters

`attn_implementation`: None \
`beta`: 0.1 \
`bf16`: True \
`learning_rate`: 1e-6 \
`lr_scheduler_type`: cosine \
`per_device_train_batch_size`: 8 \
`gradient_accumulation_steps`: 2 \
`torch_dtype`: bfloat16 \
`num_train_epochs`: 1 \
`max_prompt_length`: 512 \
`max_length`: 1024 \
`warmup_ratio`: 0.05

### Results

`init_train_loss`: 0.6958 \
`final_train_loss`: 0.5375 \
`accuracy`: 0.7188 \
`reward_margin`: 0.7227

### Training script

```python
import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
import multiprocessing
from trl import (
    DPOConfig,
    DPOTrainer,
    ModelConfig,
    ScriptArguments,
    TrlParser,
    get_kbit_device_map,
    get_peft_config,
    get_quantization_config,
)
from trl.trainer.utils import SIMPLE_CHAT_TEMPLATE

if __name__ == "__main__":
    parser = TrlParser((ScriptArguments, DPOConfig, ModelConfig))
    script_args, training_args, model_config = parser.parse_args_and_config()

    torch_dtype = (
        model_config.torch_dtype
        if model_config.torch_dtype in ["auto", None]
        else getattr(torch, model_config.torch_dtype)
    )

    quantization_config = get_quantization_config(model_config)

    model_kwargs = dict(
        revision=model_config.model_revision,
        attn_implementation=model_config.attn_implementation,
        torch_dtype=torch_dtype,
        use_cache=False if training_args.gradient_checkpointing else True,
        device_map=get_kbit_device_map() if quantization_config is not None else None,
        quantization_config=quantization_config,
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_config.model_name_or_path, trust_remote_code=model_config.trust_remote_code, **model_kwargs
    )

    peft_config = get_peft_config(model_config)
    if peft_config is None:
        ref_model = AutoModelForCausalLM.from_pretrained(
            model_config.model_name_or_path, trust_remote_code=model_config.trust_remote_code, **model_kwargs
        )
    else:
        ref_model = None

    tokenizer = AutoTokenizer.from_pretrained(
        model_config.model_name_or_path, trust_remote_code=model_config.trust_remote_code
    )
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    if tokenizer.chat_template is None:
        tokenizer.chat_template = SIMPLE_CHAT_TEMPLATE
    if script_args.ignore_bias_buffers:
        model._ddp_params_and_buffers_to_ignore = [
            name for name, buffer in model.named_buffers() if buffer.dtype == torch.bool
        ]

    dataset = load_dataset(script_args.dataset_name,
                           split=script_args.dataset_train_split)
    dataset=dataset.select_columns(['chosen', 'prompt', 'rejected'])

    trainer = DPOTrainer(
        model,
        ref_model,
        args=training_args,
        train_dataset=dataset,
        processing_class=tokenizer,
        peft_config=peft_config,
    )

    trainer.train()

    trainer.save_model(training_args.output_dir)
```