07/16/2024 09:47:54 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 [INFO|configuration_utils.py:733] 2024-07-16 09:48:00,277 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/config.json [INFO|configuration_utils.py:800] 2024-07-16 09:48:00,280 >> Model config LlamaConfig { "_name_or_path": "meta-llama/Meta-Llama-3-8B-Instruct", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128009, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.42.3", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3556] 2024-07-16 09:48:00,330 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/model.safetensors.index.json [INFO|modeling_utils.py:1531] 2024-07-16 09:48:00,332 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:1000] 2024-07-16 09:48:00,334 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128009 } [INFO|modeling_utils.py:4364] 2024-07-16 09:48:04,157 >> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4372] 2024-07-16 09:48:04,157 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Meta-Llama-3-8B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:955] 2024-07-16 09:48:04,331 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B-Instruct/snapshots/e1945c40cd546c78e41f1151f4db032b271faeaa/generation_config.json [INFO|configuration_utils.py:1000] 2024-07-16 09:48:04,332 >> Generate config GenerationConfig { "bos_token_id": 128000, "do_sample": true, "eos_token_id": [ 128001, 128009 ], "max_length": 4096, "temperature": 0.6, "top_p": 0.9 } [INFO|checkpointing.py:103] 2024-07-16 09:48:04,339 >> Gradient checkpointing enabled. [INFO|attention.py:80] 2024-07-16 09:48:04,339 >> Using torch SDPA for faster training and inference. [INFO|adapter.py:302] 2024-07-16 09:48:04,339 >> Upcasting trainable params to float32. llamafactory.model.adapter - Fine-tuning method: Full 07/16/2024 09:48:05 - INFO - llamafactory.model.loader - trainable params: 8,030,261,248 || all params: 8,030,261,248 || trainable%: 100.0000 [INFO|trainer.py:2128] 2024-07-16 09:48:27,728 >> ***** Running training ***** [INFO|trainer.py:2129] 2024-07-16 09:48:27,728 >> Num examples = 4,968 [INFO|trainer.py:2130] 2024-07-16 09:48:27,728 >> Num Epochs = 5 [INFO|trainer.py:2131] 2024-07-16 09:48:27,729 >> Instantaneous batch size per device = 2 [INFO|trainer.py:2134] 2024-07-16 09:48:27,729 >> Total train batch size (w. parallel, distributed & accumulation) = 128 [INFO|trainer.py:2135] 2024-07-16 09:48:27,729 >> Gradient Accumulation steps = 8 [INFO|trainer.py:2136] 2024-07-16 09:48:27,729 >> Total optimization steps = 190 [INFO|trainer.py:2137] 2024-07-16 09:48:27,730 >> Number of trainable parameters = 8,030,261,248 [INFO|callbacks.py:310] 2024-07-16 09:48:41,429 >> {'loss': 14.1364, 'learning_rate': 5.0000e-07, 'epoch': 0.03, 3.4262e-09, 'epoch': 4.81, 'throughput': 482.98} [INFO|callbacks.py:310] 2024-07-16 10:29:43,438 >> {'loss': 0.0015, 'learning_rate': 1.5229e-09, 'epoch': 4.84, 'throughput': 482.95} [INFO|callbacks.py:310] 2024-07-16 10:29:56,602 >> {'loss': 0.0002, 'learning_rate': 3.8076e-10, 'epoch': 4.86, 'throughput': 482.96} [INFO|callbacks.py:310] 2024-07-16 10:30:09,755 >> {'loss': 0.0028, 'learning_rate': 0.0000e+00, 'epoch': 4.89, 'throughput': 482.97} [INFO|trainer.py:3478] 2024-07-16 10:30:17,367 >> Saving model checkpoint to saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/checkpoint-190 [INFO|configuration_utils.py:472] 2024-07-16 10:30:17,370 >> Configuration saved in saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/checkpoint-190/config.json [INFO|configuration_utils.py:769] 2024-07-16 10:30:17,371 >> Configuration saved in saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/checkpoint-190/generation_config.json [INFO|modeling_utils.py:2698] 2024-07-16 10:30:33,564 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/checkpoint-190/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2574] 2024-07-16 10:30:33,568 >> tokenizer config file saved in saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/checkpoint-190/tokenizer_config.json [INFO|tokenization_utils_base.py:2583] 2024-07-16 10:30:33,568 >> Special tokens file saved in saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/checkpoint-190/special_tokens_map.json [INFO|trainer.py:2383] 2024-07-16 10:31:10,372 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|trainer.py:3478] 2024-07-16 10:31:17,984 >> Saving model checkpoint to saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3 [INFO|configuration_utils.py:472] 2024-07-16 10:31:17,987 >> Configuration saved in saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/config.json [INFO|configuration_utils.py:769] 2024-07-16 10:31:17,988 >> Configuration saved in saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/generation_config.json [INFO|modeling_utils.py:2698] 2024-07-16 10:31:35,440 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2574] 2024-07-16 10:31:35,443 >> tokenizer config file saved in saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/tokenizer_config.json [INFO|tokenization_utils_base.py:2583] 2024-07-16 10:31:35,444 >> Special tokens file saved in saves/LLaMA3-8B-Chat/full/train_2024-07-16-09-46-28_llama3/special_tokens_map.json [WARNING|ploting.py:89] 2024-07-16 10:31:36,770 >> No metric eval_loss to plot. [WARNING|ploting.py:89] 2024-07-16 10:31:36,770 >> No metric eval_accuracy to plot. [INFO|modelcard.py:449] 2024-07-16 10:31:36,770 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}