TearGosling commited on
Commit
282cb76
1 Parent(s): 0e0bdbd

End of training

Browse files
Files changed (2) hide show
  1. README.md +146 -0
  2. pytorch_model.bin +2 -2
README.md ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ base_model: mistralai/Mistral-Nemo-Base-2407
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: pyg3v1-nemo-3ep-ckpts
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
17
+ <details><summary>See axolotl config</summary>
18
+
19
+ axolotl version: `0.4.1`
20
+ ```yaml
21
+ base_model: mistralai/Mistral-Nemo-Base-2407
22
+ model_type: AutoModelForCausalLM
23
+ tokenizer_type: AutoTokenizer
24
+
25
+ load_in_8bit: false
26
+ load_in_4bit: false
27
+ strict: false
28
+
29
+ plugins:
30
+ - axolotl.integrations.liger.LigerPlugin
31
+ liger_rope: true
32
+ liger_rms_norm: true
33
+ liger_swiglu: true
34
+ liger_fused_linear_cross_entropy: true
35
+
36
+ chat_template: chatml
37
+
38
+ datasets:
39
+ - path: PygTesting/pyg3v1
40
+ type: sharegpt
41
+ conversation: chatml
42
+
43
+ hub_model_id: PygTesting/pyg3v1-nemo-3ep-ckpts
44
+ hub_strategy: every_save
45
+ hf_use_auth_token: true
46
+
47
+ dataset_prepared_path: ./data/pyg3v1-data/tokenized
48
+ val_set_size: 0.0
49
+ output_dir: ./data/pyg3v1-nemo-2eps-out
50
+
51
+ sequence_len: 8192
52
+ sample_packing: true
53
+ #eval_sample_packing: false
54
+ pad_to_sequence_len: true
55
+
56
+ wandb_project: pyg3v1-nemo
57
+ wandb_entity:
58
+ wandb_watch:
59
+ wandb_name: more_eps_lower_lr
60
+ wandb_log_model:
61
+
62
+ #unsloth_cross_entropy_loss: true
63
+
64
+ gradient_accumulation_steps: 4
65
+ micro_batch_size: 4
66
+ num_epochs: 3
67
+ optimizer: adamw_torch_fused
68
+ lr_scheduler: cosine
69
+ learning_rate: 0.0000075
70
+
71
+ train_on_inputs: false
72
+ group_by_length: false
73
+ bf16: auto
74
+ fp16:
75
+ tf32: false
76
+
77
+ gradient_checkpointing: true
78
+ gradient_checkpointing_kwargs:
79
+ use_reentrant: false
80
+ early_stopping_patience:
81
+ resume_from_checkpoint:
82
+ logging_steps: 1
83
+ xformers_attention:
84
+ flash_attention: true
85
+
86
+ warmup_ratio: 0.03
87
+ evals_per_epoch: 0
88
+ eval_table_size:
89
+ saves_per_epoch: 3
90
+ debug:
91
+ deepspeed: deepspeed_configs/zero1.json
92
+ weight_decay: 0.01
93
+ fsdp:
94
+ fsdp_config:
95
+ special_tokens:
96
+ pad_token: <pad>
97
+
98
+ ```
99
+
100
+ </details><br>
101
+
102
+ # pyg3v1-nemo-3ep-ckpts
103
+
104
+ This model is a fine-tuned version of [mistralai/Mistral-Nemo-Base-2407](https://huggingface.co/mistralai/Mistral-Nemo-Base-2407) on the None dataset.
105
+
106
+ ## Model description
107
+
108
+ More information needed
109
+
110
+ ## Intended uses & limitations
111
+
112
+ More information needed
113
+
114
+ ## Training and evaluation data
115
+
116
+ More information needed
117
+
118
+ ## Training procedure
119
+
120
+ ### Training hyperparameters
121
+
122
+ The following hyperparameters were used during training:
123
+ - learning_rate: 7.5e-06
124
+ - train_batch_size: 4
125
+ - eval_batch_size: 4
126
+ - seed: 42
127
+ - distributed_type: multi-GPU
128
+ - num_devices: 8
129
+ - gradient_accumulation_steps: 4
130
+ - total_train_batch_size: 128
131
+ - total_eval_batch_size: 32
132
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
133
+ - lr_scheduler_type: cosine
134
+ - lr_scheduler_warmup_steps: 29
135
+ - num_epochs: 3
136
+
137
+ ### Training results
138
+
139
+
140
+
141
+ ### Framework versions
142
+
143
+ - Transformers 4.45.0.dev0
144
+ - Pytorch 2.4.0+rocm6.1
145
+ - Datasets 2.21.0
146
+ - Tokenizers 0.19.1
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7b449aeb03898bfe17c0c7d6dcab2c5cf04efe827898a2e8dc6dd15e6df44834
3
- size 49706
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b5ac10dd3afaa3661ced2591c15e36fb7b8ed288066e5b66386150d71339785
3
+ size 24495615534