neginashz commited on
Commit
ca8558a
1 Parent(s): 9b1d4f8

End of training

Browse files
Files changed (2) hide show
  1. README.md +19 -13
  2. Untitled1.ipynb +0 -0
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  datasets:
9
  - medalpaca/medical_meadow_medqa
10
  model-index:
11
- - name: sft-qwen-25-7b-instruct
12
  results: []
13
  ---
14
 
@@ -50,10 +50,10 @@ wandb_log_model:
50
 
51
  gradient_accumulation_steps: 1
52
  micro_batch_size: 1
53
- num_epochs: 1
54
  optimizer: adamw_torch
55
  lr_scheduler: cosine
56
- learning_rate: 0.00001
57
 
58
  train_on_inputs: false
59
  group_by_length: false
@@ -68,8 +68,8 @@ xformers_attention:
68
  flash_attention: true
69
 
70
  warmup_steps:
71
- eval_steps: 10
72
- save_steps: 40
73
 
74
  evals_per_epoch:
75
  saves_per_epoch:
@@ -81,10 +81,11 @@ fsdp:
81
  fsdp_config:
82
  special_tokens:
83
 
84
- hub_model_id: neginashz/sft-qwen-25-7b-instruct
85
- hub_strategy: all_checkpoints
86
- early_stopping_patience: 3
87
 
 
88
  auto_resume_from_checkpoints: true
89
 
90
 
@@ -93,11 +94,11 @@ auto_resume_from_checkpoints: true
93
 
94
  </details><br>
95
 
96
- # sft-qwen-25-7b-instruct
97
 
98
  This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the medalpaca/medical_meadow_medqa dataset.
99
  It achieves the following results on the evaluation set:
100
- - Loss: 0.1055
101
 
102
  ## Model description
103
 
@@ -116,7 +117,7 @@ More information needed
116
  ### Training hyperparameters
117
 
118
  The following hyperparameters were used during training:
119
- - learning_rate: 1e-05
120
  - train_batch_size: 1
121
  - eval_batch_size: 1
122
  - seed: 42
@@ -126,8 +127,8 @@ The following hyperparameters were used during training:
126
  - total_eval_batch_size: 4
127
  - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
128
  - lr_scheduler_type: cosine
129
- - lr_scheduler_warmup_steps: 2
130
- - num_epochs: 1
131
 
132
  ### Training results
133
 
@@ -141,6 +142,11 @@ The following hyperparameters were used during training:
141
  | 0.1068 | 0.7407 | 60 | 0.1101 |
142
  | 0.1061 | 0.8642 | 70 | 0.1056 |
143
  | 0.118 | 0.9877 | 80 | 0.1055 |
 
 
 
 
 
144
 
145
 
146
  ### Framework versions
 
8
  datasets:
9
  - medalpaca/medical_meadow_medqa
10
  model-index:
11
+ - name: sft-qwen-25-7b-instruct-2
12
  results: []
13
  ---
14
 
 
50
 
51
  gradient_accumulation_steps: 1
52
  micro_batch_size: 1
53
+ num_epochs: 2
54
  optimizer: adamw_torch
55
  lr_scheduler: cosine
56
+ learning_rate: 0.000005
57
 
58
  train_on_inputs: false
59
  group_by_length: false
 
68
  flash_attention: true
69
 
70
  warmup_steps:
71
+ eval_steps:
72
+ save_steps:
73
 
74
  evals_per_epoch:
75
  saves_per_epoch:
 
81
  fsdp_config:
82
  special_tokens:
83
 
84
+ hub_model_id: neginashz/sft-qwen-25-7b-instruct-2
85
+ hub_strategy:
86
+ early_stopping_patience:
87
 
88
+ resume_from_checkpoint:
89
  auto_resume_from_checkpoints: true
90
 
91
 
 
94
 
95
  </details><br>
96
 
97
+ # sft-qwen-25-7b-instruct-2
98
 
99
  This model is a fine-tuned version of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on the medalpaca/medical_meadow_medqa dataset.
100
  It achieves the following results on the evaluation set:
101
+ - Loss: 0.1054
102
 
103
  ## Model description
104
 
 
117
  ### Training hyperparameters
118
 
119
  The following hyperparameters were used during training:
120
+ - learning_rate: 5e-06
121
  - train_batch_size: 1
122
  - eval_batch_size: 1
123
  - seed: 42
 
127
  - total_eval_batch_size: 4
128
  - optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
129
  - lr_scheduler_type: cosine
130
+ - lr_scheduler_warmup_steps: 4
131
+ - num_epochs: 2
132
 
133
  ### Training results
134
 
 
142
  | 0.1068 | 0.7407 | 60 | 0.1101 |
143
  | 0.1061 | 0.8642 | 70 | 0.1056 |
144
  | 0.118 | 0.9877 | 80 | 0.1055 |
145
+ | 0.0644 | 1.1111 | 90 | 0.1054 |
146
+ | 0.0554 | 1.2346 | 100 | 0.1054 |
147
+ | 0.0564 | 1.3580 | 110 | 0.1054 |
148
+ | 0.0601 | 1.4815 | 120 | 0.1054 |
149
+ | 0.0482 | 2.0 | 162 | 0.1054 |
150
 
151
 
152
  ### Framework versions
Untitled1.ipynb CHANGED
The diff for this file is too large to render. See raw diff