Safetensors
llama
AALF commited on
Commit
588b1da
·
verified ·
1 Parent(s): e4ce3b4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - meta-llama/Llama-3.1-8B-Instruct
5
+ ---
6
+
7
+ A preview version of FuseChat-3.0, under testing...
8
+
9
+ Training configs:
10
+ ```yaml
11
+ ### model
12
+ model_name_or_path: meta-llama/Llama-3.1-8B-Instruct
13
+
14
+ ### method
15
+ stage: sft
16
+ do_train: true
17
+ finetuning_type: full
18
+ deepspeed: examples/deepspeed/ds_z3_config.json
19
+
20
+ ### dataset
21
+ dataset: FuseChat-Mixture-v3-SFT
22
+ template: llama3
23
+ cutoff_len: 2048
24
+ overwrite_cache: true
25
+ preprocessing_num_workers: 16
26
+
27
+ ### output
28
+ output_dir: LLaMA-Factory/saves/llama31/FuseChat-Llama-3.1-8B-SFT-preview
29
+ logging_steps: 10
30
+ save_steps: 10086
31
+ plot_loss: true
32
+ overwrite_output_dir: true
33
+
34
+ ### train
35
+ per_device_train_batch_size: 8
36
+ gradient_accumulation_steps: 2
37
+ learning_rate: 5.0e-6
38
+ num_train_epochs: 3.0
39
+ lr_scheduler_type: cosine
40
+ warmup_ratio: 0.1
41
+ bf16: true
42
+ ddp_timeout: 180000000
43
+
44
+ ### custom
45
+ do_eval: false
46
+ packing: false
47
+ train_on_prompt: false
48
+ flash_attn: fa2
49
+ save_strategy: "no"
50
+ save_total_limit: 1
51
+ seed: 42
52
+ save_only_model: true
53
+ gradient_checkpointing: true
54
+ gradient_checkpointing_kwargs:
55
+ use_reentrant: False
56
+ ```