amdchess-v8 / README.md

End of training

13a60b1 verified about 1 month ago

5.69 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: amd/AMD-Llama-135m
	tags:
	- generated_from_trainer
	model-index:
	- name: amdchess-v8
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# amdchess-v8

	This model is a fine-tuned version of [amd/AMD-Llama-135m](https://huggingface.co/amd/AMD-Llama-135m) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7861

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use grokadamw with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- num_epochs: 0.25

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 3.3296 \| 0.0030 \| 5 \| 2.6555 \|
	\| 1.7617 \| 0.0059 \| 10 \| 1.6829 \|
	\| 1.344 \| 0.0089 \| 15 \| 1.3500 \|
	\| 1.1587 \| 0.0118 \| 20 \| 1.1881 \|
	\| 1.1949 \| 0.0148 \| 25 \| 1.1602 \|
	\| 1.0248 \| 0.0177 \| 30 \| 1.1076 \|
	\| 1.1176 \| 0.0207 \| 35 \| 1.1118 \|
	\| 0.9583 \| 0.0236 \| 40 \| 1.0467 \|
	\| 1.1116 \| 0.0266 \| 45 \| 1.0376 \|
	\| 0.9633 \| 0.0295 \| 50 \| 1.0231 \|
	\| 0.9704 \| 0.0325 \| 55 \| 1.0089 \|
	\| 1.0535 \| 0.0354 \| 60 \| 1.0089 \|
	\| 0.9668 \| 0.0384 \| 65 \| 0.9763 \|
	\| 0.9767 \| 0.0413 \| 70 \| 0.9681 \|
	\| 0.9745 \| 0.0443 \| 75 \| 0.9648 \|
	\| 0.9497 \| 0.0472 \| 80 \| 0.9631 \|
	\| 0.9192 \| 0.0502 \| 85 \| 0.9406 \|
	\| 0.9581 \| 0.0531 \| 90 \| 0.9435 \|
	\| 0.8981 \| 0.0561 \| 95 \| 0.9271 \|
	\| 0.9811 \| 0.0590 \| 100 \| 0.9287 \|
	\| 0.8313 \| 0.0620 \| 105 \| 0.9138 \|
	\| 0.898 \| 0.0649 \| 110 \| 0.9120 \|
	\| 0.954 \| 0.0679 \| 115 \| 0.9109 \|
	\| 0.9523 \| 0.0708 \| 120 \| 0.9067 \|
	\| 0.948 \| 0.0738 \| 125 \| 0.9001 \|
	\| 0.8825 \| 0.0767 \| 130 \| 0.8932 \|
	\| 0.9259 \| 0.0797 \| 135 \| 0.8908 \|
	\| 0.7937 \| 0.0826 \| 140 \| 0.8831 \|
	\| 0.8315 \| 0.0856 \| 145 \| 0.8794 \|
	\| 0.8488 \| 0.0885 \| 150 \| 0.8800 \|
	\| 0.8648 \| 0.0915 \| 155 \| 0.8726 \|
	\| 0.8976 \| 0.0945 \| 160 \| 0.8701 \|
	\| 0.9298 \| 0.0974 \| 165 \| 0.8650 \|
	\| 0.8856 \| 0.1004 \| 170 \| 0.8635 \|
	\| 0.7848 \| 0.1033 \| 175 \| 0.8584 \|
	\| 0.8366 \| 0.1063 \| 180 \| 0.8526 \|
	\| 0.8413 \| 0.1092 \| 185 \| 0.8531 \|
	\| 0.8577 \| 0.1122 \| 190 \| 0.8498 \|
	\| 0.8641 \| 0.1151 \| 195 \| 0.8457 \|
	\| 0.7957 \| 0.1181 \| 200 \| 0.8429 \|
	\| 0.8379 \| 0.1210 \| 205 \| 0.8454 \|
	\| 0.7596 \| 0.1240 \| 210 \| 0.8404 \|
	\| 0.8703 \| 0.1269 \| 215 \| 0.8390 \|
	\| 0.7297 \| 0.1299 \| 220 \| 0.8327 \|
	\| 0.885 \| 0.1328 \| 225 \| 0.8299 \|
	\| 0.7785 \| 0.1358 \| 230 \| 0.8300 \|
	\| 0.851 \| 0.1387 \| 235 \| 0.8264 \|
	\| 0.7234 \| 0.1417 \| 240 \| 0.8222 \|
	\| 0.7917 \| 0.1446 \| 245 \| 0.8226 \|
	\| 0.8123 \| 0.1476 \| 250 \| 0.8195 \|
	\| 0.7801 \| 0.1505 \| 255 \| 0.8170 \|
	\| 0.7086 \| 0.1535 \| 260 \| 0.8156 \|
	\| 0.8673 \| 0.1564 \| 265 \| 0.8137 \|
	\| 0.8298 \| 0.1594 \| 270 \| 0.8144 \|
	\| 0.8097 \| 0.1623 \| 275 \| 0.8113 \|
	\| 0.8079 \| 0.1653 \| 280 \| 0.8095 \|
	\| 0.7917 \| 0.1682 \| 285 \| 0.8079 \|
	\| 0.8206 \| 0.1712 \| 290 \| 0.8058 \|
	\| 0.8438 \| 0.1741 \| 295 \| 0.8037 \|
	\| 0.8519 \| 0.1771 \| 300 \| 0.8015 \|
	\| 0.8844 \| 0.1800 \| 305 \| 0.8016 \|
	\| 0.8217 \| 0.1830 \| 310 \| 0.7998 \|
	\| 0.6939 \| 0.1860 \| 315 \| 0.7982 \|
	\| 0.8021 \| 0.1889 \| 320 \| 0.7975 \|
	\| 0.8357 \| 0.1919 \| 325 \| 0.7961 \|
	\| 0.8487 \| 0.1948 \| 330 \| 0.7945 \|
	\| 0.648 \| 0.1978 \| 335 \| 0.7936 \|
	\| 0.7599 \| 0.2007 \| 340 \| 0.7924 \|
	\| 0.8203 \| 0.2037 \| 345 \| 0.7923 \|
	\| 0.8072 \| 0.2066 \| 350 \| 0.7915 \|
	\| 0.8278 \| 0.2096 \| 355 \| 0.7904 \|
	\| 0.7202 \| 0.2125 \| 360 \| 0.7898 \|
	\| 0.7229 \| 0.2155 \| 365 \| 0.7891 \|
	\| 0.8432 \| 0.2184 \| 370 \| 0.7887 \|
	\| 0.8615 \| 0.2214 \| 375 \| 0.7879 \|
	\| 0.8234 \| 0.2243 \| 380 \| 0.7875 \|
	\| 0.8101 \| 0.2273 \| 385 \| 0.7871 \|
	\| 0.8464 \| 0.2302 \| 390 \| 0.7868 \|
	\| 0.7966 \| 0.2332 \| 395 \| 0.7866 \|
	\| 0.718 \| 0.2361 \| 400 \| 0.7864 \|
	\| 0.741 \| 0.2391 \| 405 \| 0.7863 \|
	\| 0.7903 \| 0.2420 \| 410 \| 0.7862 \|
	\| 0.7671 \| 0.2450 \| 415 \| 0.7861 \|
	\| 0.7657 \| 0.2479 \| 420 \| 0.7861 \|


	### Framework versions

	- Transformers 4.46.0
	- Pytorch 2.5.0+cu121
	- Datasets 3.0.2
	- Tokenizers 0.20.1

	---
	library_name: transformers
	license: apache-2.0
	base_model: amd/AMD-Llama-135m
	tags:
	- generated_from_trainer
	model-index:
	- name: amdchess-v8
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# amdchess-v8

	This model is a fine-tuned version of [amd/AMD-Llama-135m](https://huggingface.co/amd/AMD-Llama-135m) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7861

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Use grokadamw with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- num_epochs: 0.25

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 3.3296 \| 0.0030 \| 5 \| 2.6555 \|
	\| 1.7617 \| 0.0059 \| 10 \| 1.6829 \|
	\| 1.344 \| 0.0089 \| 15 \| 1.3500 \|
	\| 1.1587 \| 0.0118 \| 20 \| 1.1881 \|
	\| 1.1949 \| 0.0148 \| 25 \| 1.1602 \|
	\| 1.0248 \| 0.0177 \| 30 \| 1.1076 \|
	\| 1.1176 \| 0.0207 \| 35 \| 1.1118 \|
	\| 0.9583 \| 0.0236 \| 40 \| 1.0467 \|
	\| 1.1116 \| 0.0266 \| 45 \| 1.0376 \|
	\| 0.9633 \| 0.0295 \| 50 \| 1.0231 \|
	\| 0.9704 \| 0.0325 \| 55 \| 1.0089 \|
	\| 1.0535 \| 0.0354 \| 60 \| 1.0089 \|
	\| 0.9668 \| 0.0384 \| 65 \| 0.9763 \|
	\| 0.9767 \| 0.0413 \| 70 \| 0.9681 \|
	\| 0.9745 \| 0.0443 \| 75 \| 0.9648 \|
	\| 0.9497 \| 0.0472 \| 80 \| 0.9631 \|
	\| 0.9192 \| 0.0502 \| 85 \| 0.9406 \|
	\| 0.9581 \| 0.0531 \| 90 \| 0.9435 \|
	\| 0.8981 \| 0.0561 \| 95 \| 0.9271 \|
	\| 0.9811 \| 0.0590 \| 100 \| 0.9287 \|
	\| 0.8313 \| 0.0620 \| 105 \| 0.9138 \|
	\| 0.898 \| 0.0649 \| 110 \| 0.9120 \|
	\| 0.954 \| 0.0679 \| 115 \| 0.9109 \|
	\| 0.9523 \| 0.0708 \| 120 \| 0.9067 \|
	\| 0.948 \| 0.0738 \| 125 \| 0.9001 \|
	\| 0.8825 \| 0.0767 \| 130 \| 0.8932 \|
	\| 0.9259 \| 0.0797 \| 135 \| 0.8908 \|
	\| 0.7937 \| 0.0826 \| 140 \| 0.8831 \|
	\| 0.8315 \| 0.0856 \| 145 \| 0.8794 \|
	\| 0.8488 \| 0.0885 \| 150 \| 0.8800 \|
	\| 0.8648 \| 0.0915 \| 155 \| 0.8726 \|
	\| 0.8976 \| 0.0945 \| 160 \| 0.8701 \|
	\| 0.9298 \| 0.0974 \| 165 \| 0.8650 \|
	\| 0.8856 \| 0.1004 \| 170 \| 0.8635 \|
	\| 0.7848 \| 0.1033 \| 175 \| 0.8584 \|
	\| 0.8366 \| 0.1063 \| 180 \| 0.8526 \|
	\| 0.8413 \| 0.1092 \| 185 \| 0.8531 \|
	\| 0.8577 \| 0.1122 \| 190 \| 0.8498 \|
	\| 0.8641 \| 0.1151 \| 195 \| 0.8457 \|
	\| 0.7957 \| 0.1181 \| 200 \| 0.8429 \|
	\| 0.8379 \| 0.1210 \| 205 \| 0.8454 \|
	\| 0.7596 \| 0.1240 \| 210 \| 0.8404 \|
	\| 0.8703 \| 0.1269 \| 215 \| 0.8390 \|
	\| 0.7297 \| 0.1299 \| 220 \| 0.8327 \|
	\| 0.885 \| 0.1328 \| 225 \| 0.8299 \|
	\| 0.7785 \| 0.1358 \| 230 \| 0.8300 \|
	\| 0.851 \| 0.1387 \| 235 \| 0.8264 \|
	\| 0.7234 \| 0.1417 \| 240 \| 0.8222 \|
	\| 0.7917 \| 0.1446 \| 245 \| 0.8226 \|
	\| 0.8123 \| 0.1476 \| 250 \| 0.8195 \|
	\| 0.7801 \| 0.1505 \| 255 \| 0.8170 \|
	\| 0.7086 \| 0.1535 \| 260 \| 0.8156 \|
	\| 0.8673 \| 0.1564 \| 265 \| 0.8137 \|
	\| 0.8298 \| 0.1594 \| 270 \| 0.8144 \|
	\| 0.8097 \| 0.1623 \| 275 \| 0.8113 \|
	\| 0.8079 \| 0.1653 \| 280 \| 0.8095 \|
	\| 0.7917 \| 0.1682 \| 285 \| 0.8079 \|
	\| 0.8206 \| 0.1712 \| 290 \| 0.8058 \|
	\| 0.8438 \| 0.1741 \| 295 \| 0.8037 \|
	\| 0.8519 \| 0.1771 \| 300 \| 0.8015 \|
	\| 0.8844 \| 0.1800 \| 305 \| 0.8016 \|
	\| 0.8217 \| 0.1830 \| 310 \| 0.7998 \|
	\| 0.6939 \| 0.1860 \| 315 \| 0.7982 \|
	\| 0.8021 \| 0.1889 \| 320 \| 0.7975 \|
	\| 0.8357 \| 0.1919 \| 325 \| 0.7961 \|
	\| 0.8487 \| 0.1948 \| 330 \| 0.7945 \|
	\| 0.648 \| 0.1978 \| 335 \| 0.7936 \|
	\| 0.7599 \| 0.2007 \| 340 \| 0.7924 \|
	\| 0.8203 \| 0.2037 \| 345 \| 0.7923 \|
	\| 0.8072 \| 0.2066 \| 350 \| 0.7915 \|
	\| 0.8278 \| 0.2096 \| 355 \| 0.7904 \|
	\| 0.7202 \| 0.2125 \| 360 \| 0.7898 \|
	\| 0.7229 \| 0.2155 \| 365 \| 0.7891 \|
	\| 0.8432 \| 0.2184 \| 370 \| 0.7887 \|
	\| 0.8615 \| 0.2214 \| 375 \| 0.7879 \|
	\| 0.8234 \| 0.2243 \| 380 \| 0.7875 \|
	\| 0.8101 \| 0.2273 \| 385 \| 0.7871 \|
	\| 0.8464 \| 0.2302 \| 390 \| 0.7868 \|
	\| 0.7966 \| 0.2332 \| 395 \| 0.7866 \|
	\| 0.718 \| 0.2361 \| 400 \| 0.7864 \|
	\| 0.741 \| 0.2391 \| 405 \| 0.7863 \|
	\| 0.7903 \| 0.2420 \| 410 \| 0.7862 \|
	\| 0.7671 \| 0.2450 \| 415 \| 0.7861 \|
	\| 0.7657 \| 0.2479 \| 420 \| 0.7861 \|


	### Framework versions

	- Transformers 4.46.0
	- Pytorch 2.5.0+cu121
	- Datasets 3.0.2
	- Tokenizers 0.20.1