Update README.md

#1
by REILX - opened
Files changed (1) hide show
  1. README.md +41 -1
README.md CHANGED
@@ -6,4 +6,44 @@ language:
6
  - en
7
  tags:
8
  - code
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - en
7
  tags:
8
  - code
9
+ ---
10
+
11
+ ### Base_model
12
+ microsoft/Phi-3-medium-128k-instruct(https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)
13
+
14
+ ### Datasets
15
+ Replete-AI/code_bagel(https://huggingface.co/datasets/Replete-AI/code_bagel)
16
+
17
+ ### Train Loss
18
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/636f54b95d2050767e4a6317/tOBahj5rDAJzqCmftVdkX.png)
19
+
20
+ ### Train State
21
+ Trainable params: 27852800 || all params: 13988090880 || trainable%: 0.1991
22
+ Total Training Duration:69h18m17s
23
+ {
24
+ "epoch": 0.9999679800589659,
25
+ "total_flos": 1.446273483573748e+20,
26
+ "train_loss": 0.44412665014957775,
27
+ "train_runtime": 249497.725,
28
+ "train_samples_per_second": 13.018,
29
+ "train_steps_per_second": 0.102
30
+ }
31
+
32
+ ### Training hyperparameters
33
+
34
+ The following hyperparameters were used during training:
35
+ - learning_rate: 5e-05
36
+ - train_batch_size: 1
37
+ - eval_batch_size: 8
38
+ - seed: 42
39
+ - distributed_type: multi-GPU
40
+ - num_devices: 8
41
+ - gradient_accumulation_steps: 16
42
+ - total_train_batch_size: 128
43
+ - total_eval_batch_size: 64
44
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
+ - lr_scheduler_type: cosine
46
+ - lr_scheduler_warmup_steps: 1200
47
+ - num_epochs: 1.0
48
+
49
+ ### I personally fine-tuned the largest dataset, which took the most time.