dhmeltzer commited on
Commit
cc96b35
·
1 Parent(s): 0caa042

End of training

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: EleutherAI/pythia-1b-deduped
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: pythia-1b-deduped-wiki_r_64_alpha_16
8
+ results: []
9
+ library_name: peft
10
+ ---
11
+
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
+
15
+ # pythia-1b-deduped-wiki_r_64_alpha_16
16
+
17
+ This model is a fine-tuned version of [EleutherAI/pythia-1b-deduped](https://huggingface.co/EleutherAI/pythia-1b-deduped) on the None dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 2.0297
20
+
21
+ ## Model description
22
+
23
+ More information needed
24
+
25
+ ## Intended uses & limitations
26
+
27
+ More information needed
28
+
29
+ ## Training and evaluation data
30
+
31
+ More information needed
32
+
33
+ ## Training procedure
34
+
35
+
36
+ The following `bitsandbytes` quantization config was used during training:
37
+ - load_in_8bit: False
38
+ - load_in_4bit: True
39
+ - llm_int8_threshold: 6.0
40
+ - llm_int8_skip_modules: None
41
+ - llm_int8_enable_fp32_cpu_offload: False
42
+ - llm_int8_has_fp16_weight: False
43
+ - bnb_4bit_quant_type: nf4
44
+ - bnb_4bit_use_double_quant: True
45
+ - bnb_4bit_compute_dtype: bfloat16
46
+ ### Training hyperparameters
47
+
48
+ The following hyperparameters were used during training:
49
+ - learning_rate: 0.0002
50
+ - train_batch_size: 8
51
+ - eval_batch_size: 8
52
+ - seed: 42
53
+ - gradient_accumulation_steps: 16
54
+ - total_train_batch_size: 128
55
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
56
+ - lr_scheduler_type: linear
57
+ - lr_scheduler_warmup_ratio: 0.03
58
+ - num_epochs: 3
59
+
60
+ ### Training results
61
+
62
+ | Training Loss | Epoch | Step | Validation Loss |
63
+ |:-------------:|:-----:|:----:|:---------------:|
64
+ | 2.0892 | 0.3 | 71 | 2.0820 |
65
+ | 1.9761 | 0.61 | 142 | 2.0623 |
66
+ | 1.9197 | 0.91 | 213 | 2.0522 |
67
+ | 1.8748 | 1.21 | 284 | 2.0495 |
68
+ | 1.9705 | 1.51 | 355 | 2.0388 |
69
+ | 1.946 | 1.82 | 426 | 2.0349 |
70
+ | 2.002 | 2.12 | 497 | 2.0329 |
71
+ | 1.8831 | 2.42 | 568 | 2.0358 |
72
+ | 1.8374 | 2.73 | 639 | 2.0297 |
73
+
74
+
75
+ ### Framework versions
76
+
77
+ - PEFT 0.4.0
78
+ - Transformers 4.31.0
79
+ - Pytorch 2.0.1+cu117
80
+ - Datasets 2.13.0
81
+ - Tokenizers 0.13.3