layoric commited on
Commit
44302d3
1 Parent(s): 68f58eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -5
README.md CHANGED
@@ -12,8 +12,80 @@ pipeline_tag: text2text-generation
12
 
13
  Trained for 3 epochs on `theblackcat102/evol-codealpaca-v1` dataset, scored decent on locally run perplexity at 4.36.
14
 
15
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  The following `bitsandbytes` quantization config was used during training:
19
  - load_in_8bit: False
@@ -58,10 +130,7 @@ The following `bitsandbytes` quantization config was used during training:
58
  - bnb_4bit_quant_type: nf4
59
  - bnb_4bit_use_double_quant: True
60
  - bnb_4bit_compute_dtype: bfloat16
61
- ### Framework versions
62
 
63
- - PEFT 0.5.0.dev0
64
- - PEFT 0.5.0.dev0
65
- - PEFT 0.5.0.dev0
66
 
 
67
  - PEFT 0.5.0.dev0
 
12
 
13
  Trained for 3 epochs on `theblackcat102/evol-codealpaca-v1` dataset, scored decent on locally run perplexity at 4.36.
14
 
15
+ ## Axolotl config used
16
+
17
+ ```yaml
18
+ base_model: NousResearch/Llama-2-13b-hf
19
+ base_model_config: NousResearch/Llama-2-13b-hf
20
+ model_type: LlamaForCausalLM
21
+ tokenizer_type: LlamaTokenizer
22
+ push_dataset_to_hub:
23
+ hub_model_id: llama-2-13b-cot-alpaca-qlora
24
+
25
+ load_in_8bit: false
26
+ load_in_4bit: true
27
+ strict: false
28
+
29
+ datasets:
30
+ - path: theblackcat102/evol-codealpaca-v1
31
+ type: alpaca
32
+ dataset_prepared_path: last_run_prepared
33
+ val_set_size: 0.01
34
+ output_dir: /checkpoints/llama-2-13b-qlora
35
+
36
+ adapter: qlora
37
+ lora_model_dir:
38
+
39
+ sequence_len: 4096
40
+ max_packed_sequence_len: 4096
41
+ lora_r: 32
42
+ lora_alpha: 16
43
+ lora_dropout: 0.05
44
+ lora_target_modules:
45
+ lora_target_linear: true
46
+ lora_fan_in_fan_out:
47
+
48
+ wandb_project:
49
+ wandb_watch:
50
+ wandb_run_id:
51
+ wandb_log_model:
52
 
53
+ gradient_accumulation_steps: 2
54
+ micro_batch_size: 2
55
+ num_epochs: 3
56
+ optimizer: paged_adamw_32bit
57
+ lr_scheduler: cosine
58
+ learning_rate: 0.0001
59
+
60
+ train_on_inputs: false
61
+ group_by_length: true
62
+ bf16: true
63
+ fp16: false
64
+ tf32: true
65
+
66
+ gradient_checkpointing: true
67
+ early_stopping_patience:
68
+ resume_from_checkpoint:
69
+ local_rank:
70
+ logging_steps: 1
71
+ xformers_attention: true
72
+ flash_attention:
73
+
74
+ warmup_steps: 10
75
+ eval_steps: 50
76
+ save_steps:
77
+ debug:
78
+ deepspeed:
79
+ weight_decay: 0.0
80
+ fsdp:
81
+ fsdp_config:
82
+ special_tokens:
83
+ bos_token: "<s>"
84
+ eos_token: "</s>"
85
+ unk_token: "<unk>"
86
+ ```
87
+
88
+ ## Training procedure
89
 
90
  The following `bitsandbytes` quantization config was used during training:
91
  - load_in_8bit: False
 
130
  - bnb_4bit_quant_type: nf4
131
  - bnb_4bit_use_double_quant: True
132
  - bnb_4bit_compute_dtype: bfloat16
 
133
 
 
 
 
134
 
135
+ ### Framework versions
136
  - PEFT 0.5.0.dev0