danielpark commited on
Commit
5087a01
1 Parent(s): d262ef6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -75
README.md CHANGED
@@ -59,78 +59,3 @@ The model weights and dataset have not been properly curated yet and are strictl
59
  - Code modification using flash attention 2.
60
  - Dataset refinement and adding hash for freezing.
61
  <br>
62
-
63
- ## Revision Infomations
64
-
65
- ### # Revision 01: [6d30494fa8da84128499d55075eef57094336d03](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct/commit/6d30494fa8da84128499d55075eef57094336d03)
66
- - 19.74k fine-tuned model weight
67
- - max_seq_length = 2048, partially modified tokenizer(related with pad token), default train param, tokenizer need to be fixed (refer 10k tokenizer)
68
-
69
- <details>
70
- <summary>See details</summary>
71
-
72
- | **Training Process** | |
73
- |----------------------------------------------|-------------------------------|
74
- | Tokenizer Used | LlamaTokenizerFast |
75
- | Training Progress (Epoch 3.15/16) | |
76
- | Step | 19740/100000 |
77
- | Google Colab Resource Usage | 150 tokens used |
78
-
79
-
80
-
81
-
82
- | **System Information** | | |
83
- |------------------------|------------|------------|
84
- | | **Used** | **Total** |
85
- | System RAM | 5.8 GB | 83.5 GB |
86
- | GPU RAM | 26.6 GB | 40.0 GB |
87
- | Disk | 74.0 GB | 166.8 GB |
88
-
89
-
90
-
91
-
92
- | **Basic Training Settings** | |
93
- |-----------------------------|---------------------------------|
94
- | local_rank | -1 |
95
- | per_device_train_batch_size | 4 |
96
- | per_device_eval_batch_size | 1 |
97
- | gradient_accumulation_steps | 4 |
98
- | learning_rate | 2e-4 |
99
- | max_grad_norm | 0.3 |
100
- | weight_decay | 0.001 |
101
- | max_seq_length | 2048 |
102
- | num_train_epochs | 1 |
103
- | max_steps | 100000 |
104
- | warmup_ratio | 0.03 |
105
- | save_steps | 500000 |
106
- | logging_steps | 10000 |
107
-
108
- | **4-bit Precision Settings** | |
109
- |-----------------------------|---------------------------------|
110
- | use_4bit | True |
111
- | use_nested_quant | False |
112
- | bnb_4bit_compute_dtype | "bfloat16" |
113
- | bnb_4bit_quant_type | "nf4" |
114
-
115
- | **LoRA Settings** | |
116
- |-----------------------------|---------------------------------|
117
- | lora_alpha | 16 |
118
- | lora_dropout | 0.1 |
119
- | lora_r | 64 |
120
-
121
- | **Advanced Training Flags** | |
122
- |-----------------------------|---------------------------------|
123
- | fp16 | False |
124
- | bf16 | False |
125
- | packing | False |
126
- | gradient_checkpointing | True |
127
- | optim | "paged_adamw_32bit" |
128
- | lr_scheduler_type | "constant" |
129
- | group_by_length | True |
130
-
131
- | **GPU Configuration** | |
132
- |-----------------------------|---------------------------------|
133
- | device_map | {"": 0} |
134
-
135
-
136
- </details>
 
59
  - Code modification using flash attention 2.
60
  - Dataset refinement and adding hash for freezing.
61
  <br>