Ogamon's picture
Initial commit
62c989d verified
raw
history blame
39.9 kB
07/16/2024 09:07:34 - INFO - llamafactory.data.template - Add pad token: </s>
07/16/2024 09:07:34 - INFO - llamafactory.data.template - Add pad token: </s>
07/16/2024 09:07:34 - INFO - llamafactory.hparams.parser - Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|parser.py:325] 2024-07-16 09:07:34,077 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
07/16/2024 09:07:34 - INFO - llamafactory.hparams.parser - Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
07/16/2024 09:07:34 - INFO - llamafactory.hparams.parser - Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
07/16/2024 09:07:34 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2161] 2024-07-16 09:07:34,347 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/tokenizer.model
[INFO|tokenization_utils_base.py:2161] 2024-07-16 09:07:34,347 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/tokenizer.json
[INFO|tokenization_utils_base.py:2161] 2024-07-16 09:07:34,348 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2161] 2024-07-16 09:07:34,348 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/special_tokens_map.json
[INFO|tokenization_utils_base.py:2161] 2024-07-16 09:07:34,348 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/tokenizer_config.json
07/16/2024 09:07:34 - INFO - llamafactory.data.template - Add pad token: </s>
07/16/2024 09:07:34 - INFO - llamafactory.data.template - Add pad token: </s>
07/16/2024 09:07:34 - INFO - llamafactory.data.template - Add pad token: </s>
07/16/2024 09:07:34 - INFO - llamafactory.data.template - Add pad token: </s>
[INFO|template.py:372] 2024-07-16 09:07:34,452 >> Add pad token: </s>
[INFO|loader.py:50] 2024-07-16 09:07:34,453 >> Loading dataset 0716_truthfulqa_benchmark_train.json...
07/16/2024 09:07:34 - INFO - llamafactory.hparams.parser - Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16
07/16/2024 09:07:34 - INFO - llamafactory.data.template - Add pad token: </s>
07/16/2024 09:07:36 - INFO - llamafactory.data.loader - Loading dataset 0716_truthfulqa_benchmark_train.json...
07/16/2024 09:07:36 - INFO - llamafactory.data.loader - Loading dataset 0716_truthfulqa_benchmark_train.json...
07/16/2024 09:07:36 - INFO - llamafactory.data.loader - Loading dataset 0716_truthfulqa_benchmark_train.json...
07/16/2024 09:07:36 - INFO - llamafactory.data.loader - Loading dataset 0716_truthfulqa_benchmark_train.json...
07/16/2024 09:07:36 - INFO - llamafactory.data.loader - Loading dataset 0716_truthfulqa_benchmark_train.json...
07/16/2024 09:07:36 - INFO - llamafactory.data.loader - Loading dataset 0716_truthfulqa_benchmark_train.json...
07/16/2024 09:07:36 - INFO - llamafactory.data.loader - Loading dataset 0716_truthfulqa_benchmark_train.json...
[INFO|configuration_utils.py:733] 2024-07-16 09:07:37,470 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/config.json
[INFO|configuration_utils.py:800] 2024-07-16 09:07:37,473 >> Model config LlamaConfig {
"_name_or_path": "meta-llama/Llama-2-7b-chat-hf",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"mlp_bias": false,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.42.3",
"use_cache": true,
"vocab_size": 32000
}
[INFO|modeling_utils.py:3556] 2024-07-16 09:07:37,523 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/model.safetensors.index.json
[INFO|modeling_utils.py:1531] 2024-07-16 09:07:37,524 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1000] 2024-07-16 09:07:37,526 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
[INFO|modeling_utils.py:4364] 2024-07-16 09:07:54,870 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4372] 2024-07-16 09:07:54,870 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-chat-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
[INFO|configuration_utils.py:955] 2024-07-16 09:07:55,055 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/generation_config.json
[INFO|configuration_utils.py:1000] 2024-07-16 09:07:55,055 >> Generate config GenerationConfig {
"bos_token_id": 1,
"do_sample": true,
"eos_token_id": 2,
"max_length": 4096,
"pad_token_id": 0,
"temperature": 0.6,
"top_p": 0.9
}
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
[INFO|checkpointing.py:103] 2024-07-16 09:07:55,062 >> Gradient checkpointing enabled.
[INFO|attention.py:80] 2024-07-16 09:07:55,062 >> Using torch SDPA for faster training and inference.
[INFO|adapter.py:302] 2024-07-16 09:07:55,062 >> Upcasting trainable params to float32.
[INFO|adapter.py:48] 2024-07-16 09:07:55,062 >> Fine-tuning method: Full
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
07/16/2024 09:07:55 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000
07/16/2024 09:07:55 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000
07/16/2024 09:07:55 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000
[INFO|loader.py:196] 2024-07-16 09:07:55,174 >> trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000
07/16/2024 09:07:55 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000
07/16/2024 09:07:55 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
07/16/2024 09:07:55 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000
[INFO|trainer.py:642] 2024-07-16 09:07:55,179 >> Using auto half precision backend
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/16/2024 09:07:55 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/16/2024 09:07:55 - INFO - llamafactory.model.adapter - Fine-tuning method: Full
07/16/2024 09:07:55 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000
[INFO|trainer.py:2128] 2024-07-16 09:08:14,231 >> ***** Running training *****
[INFO|trainer.py:2129] 2024-07-16 09:08:14,231 >> Num examples = 4,968
[INFO|trainer.py:2130] 2024-07-16 09:08:14,231 >> Num Epochs = 5
[INFO|trainer.py:2131] 2024-07-16 09:08:14,231 >> Instantaneous batch size per device = 2
[INFO|trainer.py:2134] 2024-07-16 09:08:14,231 >> Total train batch size (w. parallel, distributed & accumulation) = 128
[INFO|trainer.py:2135] 2024-07-16 09:08:14,231 >> Gradient Accumulation steps = 8
[INFO|trainer.py:2136] 2024-07-16 09:08:14,231 >> Total optimization steps = 190
[INFO|trainer.py:2137] 2024-07-16 09:08:14,233 >> Number of trainable parameters = 6,738,415,616
[INFO|callbacks.py:310] 2024-07-16 09:08:27,214 >> {'loss': 8.3599, 'learning_rate': 5.0000e-07, 'epoch': 0.03, 'throughput': 548.54}
[INFO|callbacks.py:310] 2024-07-16 09:08:38,345 >> {'loss': 8.1891, 'learning_rate': 1.0000e-06, 'epoch': 0.05, 'throughput': 575.99}
[INFO|callbacks.py:310] 2024-07-16 09:08:49,459 >> {'loss': 8.0792, 'learning_rate': 1.5000e-06, 'epoch': 0.08, 'throughput': 586.40}
[INFO|callbacks.py:310] 2024-07-16 09:09:00,554 >> {'loss': 7.9682, 'learning_rate': 2.0000e-06, 'epoch': 0.10, 'throughput': 586.87}
[INFO|callbacks.py:310] 2024-07-16 09:09:11,650 >> {'loss': 6.9482, 'learning_rate': 2.5000e-06, 'epoch': 0.13, 'throughput': 599.42}
[INFO|callbacks.py:310] 2024-07-16 09:09:22,743 >> {'loss': 5.1505, 'learning_rate': 3.0000e-06, 'epoch': 0.15, 'throughput': 599.28}
[INFO|callbacks.py:310] 2024-07-16 09:09:33,861 >> {'loss': 4.7491, 'learning_rate': 3.5000e-06, 'epoch': 0.18, 'throughput': 596.99}
[INFO|callbacks.py:310] 2024-07-16 09:09:44,975 >> {'loss': 3.2164, 'learning_rate': 4.0000e-06, 'epoch': 0.21, 'throughput': 600.21}
[INFO|callbacks.py:310] 2024-07-16 09:09:56,098 >> {'loss': 2.7761, 'learning_rate': 4.5000e-06, 'epoch': 0.23, 'throughput': 603.94}
[INFO|callbacks.py:310] 2024-07-16 09:10:07,219 >> {'loss': 0.6703, 'learning_rate': 5.0000e-06, 'epoch': 0.26, 'throughput': 605.96}
[INFO|callbacks.py:310] 2024-07-16 09:10:18,354 >> {'loss': 0.3255, 'learning_rate': 4.9996e-06, 'epoch': 0.28, 'throughput': 605.48}
[INFO|callbacks.py:310] 2024-07-16 09:10:29,443 >> {'loss': 0.3301, 'learning_rate': 4.9985e-06, 'epoch': 0.31, 'throughput': 605.64}
[INFO|callbacks.py:310] 2024-07-16 09:10:40,548 >> {'loss': 0.2121, 'learning_rate': 4.9966e-06, 'epoch': 0.33, 'throughput': 606.15}
[INFO|callbacks.py:310] 2024-07-16 09:10:51,649 >> {'loss': 1.1565, 'learning_rate': 4.9939e-06, 'epoch': 0.36, 'throughput': 607.41}
[INFO|callbacks.py:310] 2024-07-16 09:11:02,736 >> {'loss': 0.8054, 'learning_rate': 4.9905e-06, 'epoch': 0.39, 'throughput': 606.19}
[INFO|callbacks.py:310] 2024-07-16 09:11:13,851 >> {'loss': 0.2386, 'learning_rate': 4.9863e-06, 'epoch': 0.41, 'throughput': 607.52}
[INFO|callbacks.py:310] 2024-07-16 09:11:24,984 >> {'loss': 0.3161, 'learning_rate': 4.9814e-06, 'epoch': 0.44, 'throughput': 606.78}
[INFO|callbacks.py:310] 2024-07-16 09:11:36,116 >> {'loss': 0.2773, 'learning_rate': 4.9757e-06, 'epoch': 0.46, 'throughput': 607.80}
[INFO|callbacks.py:310] 2024-07-16 09:11:47,247 >> {'loss': 0.2062, 'learning_rate': 4.9692e-06, 'epoch': 0.49, 'throughput': 608.19}
[INFO|callbacks.py:310] 2024-07-16 09:11:58,362 >> {'loss': 0.1837, 'learning_rate': 4.9620e-06, 'epoch': 0.51, 'throughput': 609.22}
[INFO|callbacks.py:310] 2024-07-16 09:12:09,457 >> {'loss': 0.1735, 'learning_rate': 4.9541e-06, 'epoch': 0.54, 'throughput': 610.01}
[INFO|callbacks.py:310] 2024-07-16 09:12:20,540 >> {'loss': 0.1588, 'learning_rate': 4.9454e-06, 'epoch': 0.57, 'throughput': 609.91}
[INFO|callbacks.py:310] 2024-07-16 09:12:31,647 >> {'loss': 0.1443, 'learning_rate': 4.9359e-06, 'epoch': 0.59, 'throughput': 610.82}
[INFO|callbacks.py:310] 2024-07-16 09:12:42,733 >> {'loss': 0.1570, 'learning_rate': 4.9257e-06, 'epoch': 0.62, 'throughput': 609.97}
[INFO|callbacks.py:310] 2024-07-16 09:12:53,861 >> {'loss': 0.1199, 'learning_rate': 4.9148e-06, 'epoch': 0.64, 'throughput': 609.21}
[INFO|callbacks.py:310] 2024-07-16 09:13:04,974 >> {'loss': 0.1539, 'learning_rate': 4.9032e-06, 'epoch': 0.67, 'throughput': 609.20}
[INFO|callbacks.py:310] 2024-07-16 09:13:16,096 >> {'loss': 0.1208, 'learning_rate': 4.8908e-06, 'epoch': 0.69, 'throughput': 609.87}
[INFO|callbacks.py:310] 2024-07-16 09:13:27,217 >> {'loss': 0.0954, 'learning_rate': 4.8776e-06, 'epoch': 0.72, 'throughput': 610.39}
[INFO|callbacks.py:310] 2024-07-16 09:13:38,328 >> {'loss': 0.1387, 'learning_rate': 4.8638e-06, 'epoch': 0.75, 'throughput': 611.13}
[INFO|callbacks.py:310] 2024-07-16 09:13:49,415 >> {'loss': 0.1484, 'learning_rate': 4.8492e-06, 'epoch': 0.77, 'throughput': 612.02}
[INFO|callbacks.py:310] 2024-07-16 09:14:00,513 >> {'loss': 0.0998, 'learning_rate': 4.8340e-06, 'epoch': 0.80, 'throughput': 612.22}
[INFO|callbacks.py:310] 2024-07-16 09:14:11,593 >> {'loss': 0.1068, 'learning_rate': 4.8180e-06, 'epoch': 0.82, 'throughput': 612.05}
[INFO|callbacks.py:310] 2024-07-16 09:14:22,685 >> {'loss': 0.0801, 'learning_rate': 4.8013e-06, 'epoch': 0.85, 'throughput': 612.99}
[INFO|callbacks.py:310] 2024-07-16 09:14:33,813 >> {'loss': 0.1066, 'learning_rate': 4.7839e-06, 'epoch': 0.87, 'throughput': 612.89}
[INFO|callbacks.py:310] 2024-07-16 09:14:44,935 >> {'loss': 0.1038, 'learning_rate': 4.7658e-06, 'epoch': 0.90, 'throughput': 613.01}
[INFO|callbacks.py:310] 2024-07-16 09:14:56,047 >> {'loss': 0.1060, 'learning_rate': 4.7470e-06, 'epoch': 0.93, 'throughput': 612.94}
[INFO|callbacks.py:310] 2024-07-16 09:15:07,172 >> {'loss': 0.1107, 'learning_rate': 4.7275e-06, 'epoch': 0.95, 'throughput': 613.01}
[INFO|callbacks.py:310] 2024-07-16 09:15:18,265 >> {'loss': 0.1372, 'learning_rate': 4.7074e-06, 'epoch': 0.98, 'throughput': 613.54}
[INFO|callbacks.py:310] 2024-07-16 09:15:29,366 >> {'loss': 0.0816, 'learning_rate': 4.6865e-06, 'epoch': 1.00, 'throughput': 613.88}
[INFO|callbacks.py:310] 2024-07-16 09:15:40,449 >> {'loss': 0.0743, 'learning_rate': 4.6651e-06, 'epoch': 1.03, 'throughput': 614.30}
[INFO|callbacks.py:310] 2024-07-16 09:15:51,540 >> {'loss': 0.0720, 'learning_rate': 4.6429e-06, 'epoch': 1.05, 'throughput': 614.77}
[INFO|callbacks.py:310] 2024-07-16 09:16:02,629 >> {'loss': 0.0596, 'learning_rate': 4.6201e-06, 'epoch': 1.08, 'throughput': 614.97}
[INFO|callbacks.py:310] 2024-07-16 09:16:13,746 >> {'loss': 0.0544, 'learning_rate': 4.5967e-06, 'epoch': 1.11, 'throughput': 615.46}
[INFO|callbacks.py:310] 2024-07-16 09:16:24,855 >> {'loss': 0.0342, 'learning_rate': 4.5726e-06, 'epoch': 1.13, 'throughput': 615.55}
[INFO|callbacks.py:310] 2024-07-16 09:16:35,985 >> {'loss': 0.0394, 'learning_rate': 4.5479e-06, 'epoch': 1.16, 'throughput': 615.19}
[INFO|callbacks.py:310] 2024-07-16 09:16:47,103 >> {'loss': 0.0196, 'learning_rate': 4.5225e-06, 'epoch': 1.18, 'throughput': 615.36}
[INFO|callbacks.py:310] 2024-07-16 09:16:58,199 >> {'loss': 0.0411, 'learning_rate': 4.4966e-06, 'epoch': 1.21, 'throughput': 615.43}
[INFO|callbacks.py:310] 2024-07-16 09:17:09,282 >> {'loss': 0.0257, 'learning_rate': 4.4700e-06, 'epoch': 1.23, 'throughput': 614.94}
[INFO|callbacks.py:310] 2024-07-16 09:17:20,373 >> {'loss': 0.0289, 'learning_rate': 4.4429e-06, 'epoch': 1.26, 'throughput': 615.29}
[INFO|callbacks.py:310] 2024-07-16 09:17:31,470 >> {'loss': 0.1193, 'learning_rate': 4.4151e-06, 'epoch': 1.29, 'throughput': 615.01}
[INFO|callbacks.py:310] 2024-07-16 09:17:42,559 >> {'loss': 0.0883, 'learning_rate': 4.3868e-06, 'epoch': 1.31, 'throughput': 614.92}
[INFO|callbacks.py:310] 2024-07-16 09:17:53,670 >> {'loss': 0.0377, 'learning_rate': 4.3579e-06, 'epoch': 1.34, 'throughput': 614.86}
[INFO|callbacks.py:310] 2024-07-16 09:18:04,800 >> {'loss': 0.0602, 'learning_rate': 4.3284e-06, 'epoch': 1.36, 'throughput': 614.73}
[INFO|callbacks.py:310] 2024-07-16 09:18:15,923 >> {'loss': 0.0830, 'learning_rate': 4.2983e-06, 'epoch': 1.39, 'throughput': 614.38}
[INFO|callbacks.py:310] 2024-07-16 09:18:27,039 >> {'loss': 0.0358, 'learning_rate': 4.2678e-06, 'epoch': 1.41, 'throughput': 614.72}
[INFO|callbacks.py:310] 2024-07-16 09:18:38,136 >> {'loss': 0.0321, 'learning_rate': 4.2366e-06, 'epoch': 1.44, 'throughput': 614.84}
[INFO|callbacks.py:310] 2024-07-16 09:18:49,231 >> {'loss': 0.0452, 'learning_rate': 4.2050e-06, 'epoch': 1.47, 'throughput': 615.11}
[INFO|callbacks.py:310] 2024-07-16 09:19:00,331 >> {'loss': 0.0915, 'learning_rate': 4.1728e-06, 'epoch': 1.49, 'throughput': 615.02}
[INFO|callbacks.py:310] 2024-07-16 09:19:11,424 >> {'loss': 0.0651, 'learning_rate': 4.1401e-06, 'epoch': 1.52, 'throughput': 614.81}
[INFO|callbacks.py:310] 2024-07-16 09:19:22,545 >> {'loss': 0.0868, 'learning_rate': 4.1070e-06, 'epoch': 1.54, 'throughput': 614.92}
[INFO|callbacks.py:310] 2024-07-16 09:19:33,666 >> {'loss': 0.0554, 'learning_rate': 4.0733e-06, 'epoch': 1.57, 'throughput': 615.06}
[INFO|callbacks.py:310] 2024-07-16 09:19:44,774 >> {'loss': 0.0336, 'learning_rate': 4.0392e-06, 'epoch': 1.59, 'throughput': 615.29}
[INFO|callbacks.py:310] 2024-07-16 09:19:55,885 >> {'loss': 0.0455, 'learning_rate': 4.0045e-06, 'epoch': 1.62, 'throughput': 615.69}
[INFO|callbacks.py:310] 2024-07-16 09:20:07,002 >> {'loss': 0.0406, 'learning_rate': 3.9695e-06, 'epoch': 1.65, 'throughput': 615.45}
[INFO|callbacks.py:310] 2024-07-16 09:20:18,095 >> {'loss': 0.0461, 'learning_rate': 3.9339e-06, 'epoch': 1.67, 'throughput': 615.37}
[INFO|callbacks.py:310] 2024-07-16 09:20:29,180 >> {'loss': 0.0466, 'learning_rate': 3.8980e-06, 'epoch': 1.70, 'throughput': 615.10}
[INFO|callbacks.py:310] 2024-07-16 09:20:40,282 >> {'loss': 0.0382, 'learning_rate': 3.8616e-06, 'epoch': 1.72, 'throughput': 615.23}
[INFO|callbacks.py:310] 2024-07-16 09:20:51,381 >> {'loss': 0.0426, 'learning_rate': 3.8248e-06, 'epoch': 1.75, 'throughput': 614.90}
[INFO|callbacks.py:310] 2024-07-16 09:21:02,489 >> {'loss': 0.0264, 'learning_rate': 3.7876e-06, 'epoch': 1.77, 'throughput': 615.03}
[INFO|callbacks.py:310] 2024-07-16 09:21:13,594 >> {'loss': 0.0567, 'learning_rate': 3.7500e-06, 'epoch': 1.80, 'throughput': 615.11}
[INFO|callbacks.py:310] 2024-07-16 09:21:24,706 >> {'loss': 0.0688, 'learning_rate': 3.7120e-06, 'epoch': 1.83, 'throughput': 615.35}
[INFO|callbacks.py:310] 2024-07-16 09:21:35,842 >> {'loss': 0.0351, 'learning_rate': 3.6737e-06, 'epoch': 1.85, 'throughput': 614.88}
[INFO|callbacks.py:310] 2024-07-16 09:21:46,947 >> {'loss': 0.0246, 'learning_rate': 3.6350e-06, 'epoch': 1.88, 'throughput': 614.93}
[INFO|callbacks.py:310] 2024-07-16 09:21:58,021 >> {'loss': 0.0364, 'learning_rate': 3.5959e-06, 'epoch': 1.90, 'throughput': 615.23}
[INFO|callbacks.py:310] 2024-07-16 09:22:09,127 >> {'loss': 0.0352, 'learning_rate': 3.5565e-06, 'epoch': 1.93, 'throughput': 615.23}
[INFO|callbacks.py:310] 2024-07-16 09:22:20,219 >> {'loss': 0.0915, 'learning_rate': 3.5168e-06, 'epoch': 1.95, 'throughput': 615.24}
[INFO|callbacks.py:310] 2024-07-16 09:22:31,310 >> {'loss': 0.0327, 'learning_rate': 3.4768e-06, 'epoch': 1.98, 'throughput': 614.95}
[INFO|callbacks.py:310] 2024-07-16 09:22:42,417 >> {'loss': 0.0448, 'learning_rate': 3.4365e-06, 'epoch': 2.01, 'throughput': 615.21}
[INFO|callbacks.py:310] 2024-07-16 09:22:53,536 >> {'loss': 0.0186, 'learning_rate': 3.3959e-06, 'epoch': 2.03, 'throughput': 615.29}
[INFO|callbacks.py:310] 2024-07-16 09:23:04,675 >> {'loss': 0.0342, 'learning_rate': 3.3551e-06, 'epoch': 2.06, 'throughput': 615.30}
[INFO|callbacks.py:310] 2024-07-16 09:23:15,801 >> {'loss': 0.0079, 'learning_rate': 3.3139e-06, 'epoch': 2.08, 'throughput': 615.14}
[INFO|callbacks.py:310] 2024-07-16 09:23:26,896 >> {'loss': 0.0177, 'learning_rate': 3.2725e-06, 'epoch': 2.11, 'throughput': 615.01}
[INFO|callbacks.py:310] 2024-07-16 09:23:37,991 >> {'loss': 0.0139, 'learning_rate': 3.2309e-06, 'epoch': 2.14, 'throughput': 614.74}
[INFO|callbacks.py:310] 2024-07-16 09:23:49,080 >> {'loss': 0.0103, 'learning_rate': 3.1891e-06, 'epoch': 2.16, 'throughput': 615.15}
[INFO|callbacks.py:310] 2024-07-16 09:24:00,169 >> {'loss': 0.0221, 'learning_rate': 3.1470e-06, 'epoch': 2.19, 'throughput': 615.35}
[INFO|callbacks.py:310] 2024-07-16 09:24:11,255 >> {'loss': 0.0021, 'learning_rate': 3.1048e-06, 'epoch': 2.21, 'throughput': 615.26}
[INFO|callbacks.py:310] 2024-07-16 09:24:22,375 >> {'loss': 0.0110, 'learning_rate': 3.0624e-06, 'epoch': 2.24, 'throughput': 615.65}
[INFO|callbacks.py:310] 2024-07-16 09:24:33,470 >> {'loss': 0.0081, 'learning_rate': 3.0198e-06, 'epoch': 2.26, 'throughput': 615.45}
[INFO|callbacks.py:310] 2024-07-16 09:24:44,602 >> {'loss': 0.0149, 'learning_rate': 2.9770e-06, 'epoch': 2.29, 'throughput': 615.35}
[INFO|callbacks.py:310] 2024-07-16 09:24:55,725 >> {'loss': 0.0010, 'learning_rate': 2.9341e-06, 'epoch': 2.32, 'throughput': 615.53}
[INFO|callbacks.py:310] 2024-07-16 09:25:06,826 >> {'loss': 0.0070, 'learning_rate': 2.8911e-06, 'epoch': 2.34, 'throughput': 615.54}
[INFO|callbacks.py:310] 2024-07-16 09:25:17,934 >> {'loss': 0.0089, 'learning_rate': 2.8479e-06, 'epoch': 2.37, 'throughput': 615.48}
[INFO|callbacks.py:310] 2024-07-16 09:25:29,026 >> {'loss': 0.0013, 'learning_rate': 2.8047e-06, 'epoch': 2.39, 'throughput': 615.67}
[INFO|callbacks.py:310] 2024-07-16 09:25:40,116 >> {'loss': 0.0267, 'learning_rate': 2.7613e-06, 'epoch': 2.42, 'throughput': 615.82}
[INFO|callbacks.py:310] 2024-07-16 09:25:51,214 >> {'loss': 0.0171, 'learning_rate': 2.7179e-06, 'epoch': 2.44, 'throughput': 615.76}
[INFO|callbacks.py:310] 2024-07-16 09:26:02,342 >> {'loss': 0.0375, 'learning_rate': 2.6744e-06, 'epoch': 2.47, 'throughput': 615.50}
[INFO|callbacks.py:310] 2024-07-16 09:26:13,469 >> {'loss': 0.0101, 'learning_rate': 2.6308e-06, 'epoch': 2.50, 'throughput': 615.37}
[INFO|callbacks.py:310] 2024-07-16 09:26:24,600 >> {'loss': 0.0282, 'learning_rate': 2.5872e-06, 'epoch': 2.52, 'throughput': 615.50}
[INFO|callbacks.py:310] 2024-07-16 09:26:35,708 >> {'loss': 0.0069, 'learning_rate': 2.5436e-06, 'epoch': 2.55, 'throughput': 615.47}
[INFO|callbacks.py:310] 2024-07-16 09:26:46,803 >> {'loss': 0.0135, 'learning_rate': 2.5000e-06, 'epoch': 2.57, 'throughput': 615.66}
[INFO|callbacks.py:310] 2024-07-16 09:26:57,903 >> {'loss': 0.0062, 'learning_rate': 2.4564e-06, 'epoch': 2.60, 'throughput': 615.71}
[INFO|callbacks.py:310] 2024-07-16 09:27:08,991 >> {'loss': 0.0050, 'learning_rate': 2.4128e-06, 'epoch': 2.62, 'throughput': 615.56}
[INFO|callbacks.py:310] 2024-07-16 09:27:20,085 >> {'loss': 0.0285, 'learning_rate': 2.3692e-06, 'epoch': 2.65, 'throughput': 615.65}
[INFO|callbacks.py:310] 2024-07-16 09:27:31,191 >> {'loss': 0.0225, 'learning_rate': 2.3256e-06, 'epoch': 2.68, 'throughput': 615.86}
[INFO|callbacks.py:310] 2024-07-16 09:27:42,299 >> {'loss': 0.0280, 'learning_rate': 2.2821e-06, 'epoch': 2.70, 'throughput': 615.69}
[INFO|callbacks.py:310] 2024-07-16 09:27:53,416 >> {'loss': 0.0176, 'learning_rate': 2.2387e-06, 'epoch': 2.73, 'throughput': 615.60}
[INFO|callbacks.py:310] 2024-07-16 09:28:04,554 >> {'loss': 0.0047, 'learning_rate': 2.1953e-06, 'epoch': 2.75, 'throughput': 615.36}
[INFO|callbacks.py:310] 2024-07-16 09:28:15,674 >> {'loss': 0.0135, 'learning_rate': 2.1521e-06, 'epoch': 2.78, 'throughput': 615.25}
[INFO|callbacks.py:310] 2024-07-16 09:28:26,766 >> {'loss': 0.0044, 'learning_rate': 2.1089e-06, 'epoch': 2.80, 'throughput': 615.51}
[INFO|callbacks.py:310] 2024-07-16 09:28:37,852 >> {'loss': 0.0252, 'learning_rate': 2.0659e-06, 'epoch': 2.83, 'throughput': 615.50}
[INFO|callbacks.py:310] 2024-07-16 09:28:48,945 >> {'loss': 0.0249, 'learning_rate': 2.0230e-06, 'epoch': 2.86, 'throughput': 615.61}
[INFO|callbacks.py:310] 2024-07-16 09:29:00,043 >> {'loss': 0.0146, 'learning_rate': 1.9802e-06, 'epoch': 2.88, 'throughput': 615.75}
[INFO|callbacks.py:310] 2024-07-16 09:29:11,162 >> {'loss': 0.0044, 'learning_rate': 1.9376e-06, 'epoch': 2.91, 'throughput': 615.69}
[INFO|callbacks.py:310] 2024-07-16 09:29:22,253 >> {'loss': 0.0054, 'learning_rate': 1.8952e-06, 'epoch': 2.93, 'throughput': 615.71}
[INFO|callbacks.py:310] 2024-07-16 09:29:33,390 >> {'loss': 0.0106, 'learning_rate': 1.8530e-06, 'epoch': 2.96, 'throughput': 615.58}
[INFO|callbacks.py:310] 2024-07-16 09:29:44,539 >> {'loss': 0.0167, 'learning_rate': 1.8109e-06, 'epoch': 2.98, 'throughput': 615.53}
[INFO|callbacks.py:310] 2024-07-16 09:29:55,648 >> {'loss': 0.0090, 'learning_rate': 1.7691e-06, 'epoch': 3.01, 'throughput': 615.55}
[INFO|callbacks.py:310] 2024-07-16 09:30:06,727 >> {'loss': 0.0024, 'learning_rate': 1.7275e-06, 'epoch': 3.04, 'throughput': 615.66}
[INFO|callbacks.py:310] 2024-07-16 09:30:17,830 >> {'loss': 0.0235, 'learning_rate': 1.6861e-06, 'epoch': 3.06, 'throughput': 615.60}
[INFO|callbacks.py:310] 2024-07-16 09:30:28,918 >> {'loss': 0.0179, 'learning_rate': 1.6449e-06, 'epoch': 3.09, 'throughput': 615.53}
[INFO|callbacks.py:310] 2024-07-16 09:30:40,012 >> {'loss': 0.0059, 'learning_rate': 1.6041e-06, 'epoch': 3.11, 'throughput': 615.35}
[INFO|callbacks.py:310] 2024-07-16 09:30:51,146 >> {'loss': 0.0017, 'learning_rate': 1.5635e-06, 'epoch': 3.14, 'throughput': 615.08}
[INFO|callbacks.py:310] 2024-07-16 09:31:02,257 >> {'loss': 0.0018, 'learning_rate': 1.5232e-06, 'epoch': 3.16, 'throughput': 615.02}
[INFO|callbacks.py:310] 2024-07-16 09:31:13,381 >> {'loss': 0.0032, 'learning_rate': 1.4832e-06, 'epoch': 3.19, 'throughput': 615.23}
[INFO|callbacks.py:310] 2024-07-16 09:31:24,512 >> {'loss': 0.0019, 'learning_rate': 1.4435e-06, 'epoch': 3.22, 'throughput': 615.29}
[INFO|callbacks.py:310] 2024-07-16 09:31:35,616 >> {'loss': 0.0014, 'learning_rate': 1.4041e-06, 'epoch': 3.24, 'throughput': 615.27}
[INFO|callbacks.py:310] 2024-07-16 09:31:46,705 >> {'loss': 0.0052, 'learning_rate': 1.3650e-06, 'epoch': 3.27, 'throughput': 615.45}
[INFO|callbacks.py:310] 2024-07-16 09:31:57,796 >> {'loss': 0.0005, 'learning_rate': 1.3263e-06, 'epoch': 3.29, 'throughput': 615.55}
[INFO|callbacks.py:310] 2024-07-16 09:32:08,900 >> {'loss': 0.0131, 'learning_rate': 1.2880e-06, 'epoch': 3.32, 'throughput': 615.52}
[INFO|callbacks.py:310] 2024-07-16 09:32:19,991 >> {'loss': 0.0009, 'learning_rate': 1.2500e-06, 'epoch': 3.34, 'throughput': 615.54}
[INFO|callbacks.py:310] 2024-07-16 09:32:31,110 >> {'loss': 0.0057, 'learning_rate': 1.2124e-06, 'epoch': 3.37, 'throughput': 615.63}
[INFO|callbacks.py:310] 2024-07-16 09:32:42,232 >> {'loss': 0.0002, 'learning_rate': 1.1752e-06, 'epoch': 3.40, 'throughput': 615.53}
[INFO|callbacks.py:310] 2024-07-16 09:32:53,354 >> {'loss': 0.0002, 'learning_rate': 1.1384e-06, 'epoch': 3.42, 'throughput': 615.37}
[INFO|callbacks.py:310] 2024-07-16 09:33:04,458 >> {'loss': 0.0145, 'learning_rate': 1.1020e-06, 'epoch': 3.45, 'throughput': 615.56}
[INFO|callbacks.py:310] 2024-07-16 09:33:15,548 >> {'loss': 0.0034, 'learning_rate': 1.0661e-06, 'epoch': 3.47, 'throughput': 615.59}
[INFO|callbacks.py:310] 2024-07-16 09:33:26,645 >> {'loss': 0.0156, 'learning_rate': 1.0305e-06, 'epoch': 3.50, 'throughput': 615.43}
[INFO|callbacks.py:310] 2024-07-16 09:33:37,738 >> {'loss': 0.0013, 'learning_rate': 9.9546e-07, 'epoch': 3.52, 'throughput': 615.59}
[INFO|callbacks.py:310] 2024-07-16 09:33:48,828 >> {'loss': 0.0007, 'learning_rate': 9.6085e-07, 'epoch': 3.55, 'throughput': 615.56}
[INFO|callbacks.py:310] 2024-07-16 09:33:59,926 >> {'loss': 0.0005, 'learning_rate': 9.2670e-07, 'epoch': 3.58, 'throughput': 615.58}
[INFO|callbacks.py:310] 2024-07-16 09:34:11,043 >> {'loss': 0.0034, 'learning_rate': 8.9303e-07, 'epoch': 3.60, 'throughput': 615.52}
[INFO|callbacks.py:310] 2024-07-16 09:34:22,149 >> {'loss': 0.0001, 'learning_rate': 8.5985e-07, 'epoch': 3.63, 'throughput': 615.41}
[INFO|callbacks.py:310] 2024-07-16 09:34:33,264 >> {'loss': 0.0010, 'learning_rate': 8.2717e-07, 'epoch': 3.65, 'throughput': 615.49}
[INFO|callbacks.py:310] 2024-07-16 09:34:44,385 >> {'loss': 0.0123, 'learning_rate': 7.9500e-07, 'epoch': 3.68, 'throughput': 615.44}
[INFO|callbacks.py:310] 2024-07-16 09:34:55,486 >> {'loss': 0.0002, 'learning_rate': 7.6335e-07, 'epoch': 3.70, 'throughput': 615.35}
[INFO|callbacks.py:310] 2024-07-16 09:35:06,571 >> {'loss': 0.0110, 'learning_rate': 7.3223e-07, 'epoch': 3.73, 'throughput': 615.39}
[INFO|callbacks.py:310] 2024-07-16 09:35:17,657 >> {'loss': 0.0008, 'learning_rate': 7.0165e-07, 'epoch': 3.76, 'throughput': 615.17}
[INFO|callbacks.py:310] 2024-07-16 09:35:28,737 >> {'loss': 0.0003, 'learning_rate': 6.7162e-07, 'epoch': 3.78, 'throughput': 615.50}
[INFO|callbacks.py:310] 2024-07-16 09:35:39,839 >> {'loss': 0.0018, 'learning_rate': 6.4214e-07, 'epoch': 3.81, 'throughput': 615.55}
[INFO|callbacks.py:310] 2024-07-16 09:35:50,950 >> {'loss': 0.0016, 'learning_rate': 6.1323e-07, 'epoch': 3.83, 'throughput': 615.59}
[INFO|callbacks.py:310] 2024-07-16 09:36:02,073 >> {'loss': 0.0021, 'learning_rate': 5.8489e-07, 'epoch': 3.86, 'throughput': 615.56}
[INFO|callbacks.py:310] 2024-07-16 09:36:13,188 >> {'loss': 0.0001, 'learning_rate': 5.5714e-07, 'epoch': 3.88, 'throughput': 615.68}
[INFO|callbacks.py:310] 2024-07-16 09:36:24,312 >> {'loss': 0.0001, 'learning_rate': 5.2997e-07, 'epoch': 3.91, 'throughput': 615.50}
[INFO|callbacks.py:310] 2024-07-16 09:36:35,410 >> {'loss': 0.0003, 'learning_rate': 5.0341e-07, 'epoch': 3.94, 'throughput': 615.45}
[INFO|callbacks.py:310] 2024-07-16 09:36:46,505 >> {'loss': 0.0002, 'learning_rate': 4.7746e-07, 'epoch': 3.96, 'throughput': 615.52}
[INFO|callbacks.py:310] 2024-07-16 09:36:57,590 >> {'loss': 0.0002, 'learning_rate': 4.5212e-07, 'epoch': 3.99, 'throughput': 615.41}
[INFO|callbacks.py:310] 2024-07-16 09:37:08,665 >> {'loss': 0.0000, 'learning_rate': 4.2741e-07, 'epoch': 4.01, 'throughput': 615.56}
[INFO|callbacks.py:310] 2024-07-16 09:37:19,763 >> {'loss': 0.0003, 'learning_rate': 4.0332e-07, 'epoch': 4.04, 'throughput': 615.58}
[INFO|callbacks.py:310] 2024-07-16 09:37:30,878 >> {'loss': 0.0002, 'learning_rate': 3.7988e-07, 'epoch': 4.06, 'throughput': 615.55}
[INFO|callbacks.py:310] 2024-07-16 09:37:41,999 >> {'loss': 0.0001, 'learning_rate': 3.5708e-07, 'epoch': 4.09, 'throughput': 615.40}
[INFO|callbacks.py:310] 2024-07-16 09:37:53,113 >> {'loss': 0.0001, 'learning_rate': 3.3494e-07, 'epoch': 4.12, 'throughput': 615.53}
[INFO|callbacks.py:310] 2024-07-16 09:38:04,234 >> {'loss': 0.0001, 'learning_rate': 3.1345e-07, 'epoch': 4.14, 'throughput': 615.52}
[INFO|callbacks.py:310] 2024-07-16 09:38:15,321 >> {'loss': 0.0000, 'learning_rate': 2.9263e-07, 'epoch': 4.17, 'throughput': 615.59}
[INFO|callbacks.py:310] 2024-07-16 09:38:26,408 >> {'loss': 0.0001, 'learning_rate': 2.7248e-07, 'epoch': 4.19, 'throughput': 615.69}
[INFO|callbacks.py:310] 2024-07-16 09:38:37,489 >> {'loss': 0.0000, 'learning_rate': 2.5301e-07, 'epoch': 4.22, 'throughput': 615.71}
[INFO|callbacks.py:310] 2024-07-16 09:38:48,575 >> {'loss': 0.0001, 'learning_rate': 2.3423e-07, 'epoch': 4.24, 'throughput': 615.55}
[INFO|callbacks.py:310] 2024-07-16 09:38:59,677 >> {'loss': 0.0001, 'learning_rate': 2.1614e-07, 'epoch': 4.27, 'throughput': 615.61}
[INFO|callbacks.py:310] 2024-07-16 09:39:10,799 >> {'loss': 0.0002, 'learning_rate': 1.9874e-07, 'epoch': 4.30, 'throughput': 615.64}
[INFO|callbacks.py:310] 2024-07-16 09:39:21,928 >> {'loss': 0.0002, 'learning_rate': 1.8204e-07, 'epoch': 4.32, 'throughput': 615.58}
[INFO|callbacks.py:310] 2024-07-16 09:39:33,061 >> {'loss': 0.0001, 'learning_rate': 1.6605e-07, 'epoch': 4.35, 'throughput': 615.46}
[INFO|callbacks.py:310] 2024-07-16 09:39:44,166 >> {'loss': 0.0001, 'learning_rate': 1.5077e-07, 'epoch': 4.37, 'throughput': 615.43}
[INFO|callbacks.py:310] 2024-07-16 09:39:55,251 >> {'loss': 0.0115, 'learning_rate': 1.3620e-07, 'epoch': 4.40, 'throughput': 615.48}
[INFO|callbacks.py:310] 2024-07-16 09:40:06,330 >> {'loss': 0.0005, 'learning_rate': 1.2236e-07, 'epoch': 4.42, 'throughput': 615.47}
[INFO|callbacks.py:310] 2024-07-16 09:40:17,431 >> {'loss': 0.0003, 'learning_rate': 1.0924e-07, 'epoch': 4.45, 'throughput': 615.57}
[INFO|callbacks.py:310] 2024-07-16 09:40:28,525 >> {'loss': 0.0050, 'learning_rate': 9.6846e-08, 'epoch': 4.48, 'throughput': 615.49}
[INFO|callbacks.py:310] 2024-07-16 09:40:39,640 >> {'loss': 0.0003, 'learning_rate': 8.5185e-08, 'epoch': 4.50, 'throughput': 615.38}
[INFO|callbacks.py:310] 2024-07-16 09:40:50,749 >> {'loss': 0.0008, 'learning_rate': 7.4261e-08, 'epoch': 4.53, 'throughput': 615.27}
[INFO|callbacks.py:310] 2024-07-16 09:41:01,862 >> {'loss': 0.0000, 'learning_rate': 6.4075e-08, 'epoch': 4.55, 'throughput': 615.37}
[INFO|callbacks.py:310] 2024-07-16 09:41:12,986 >> {'loss': 0.0000, 'learning_rate': 5.4631e-08, 'epoch': 4.58, 'throughput': 615.39}
[INFO|callbacks.py:310] 2024-07-16 09:41:24,079 >> {'loss': 0.0042, 'learning_rate': 4.5932e-08, 'epoch': 4.60, 'throughput': 615.48}
[INFO|callbacks.py:310] 2024-07-16 09:41:35,169 >> {'loss': 0.0004, 'learning_rate': 3.7981e-08, 'epoch': 4.63, 'throughput': 615.54}
[INFO|callbacks.py:310] 2024-07-16 09:41:46,249 >> {'loss': 0.0001, 'learning_rate': 3.0779e-08, 'epoch': 4.66, 'throughput': 615.42}
[INFO|callbacks.py:310] 2024-07-16 09:41:57,352 >> {'loss': 0.0001, 'learning_rate': 2.4330e-08, 'epoch': 4.68, 'throughput': 615.30}
[INFO|callbacks.py:310] 2024-07-16 09:42:08,449 >> {'loss': 0.0004, 'learning_rate': 1.8635e-08, 'epoch': 4.71, 'throughput': 615.12}
[INFO|callbacks.py:310] 2024-07-16 09:42:19,548 >> {'loss': 0.0000, 'learning_rate': 1.3695e-08, 'epoch': 4.73, 'throughput': 615.05}
[INFO|callbacks.py:310] 2024-07-16 09:42:30,662 >> {'loss': 0.0009, 'learning_rate': 9.5133e-09, 'epoch': 4.76, 'throughput': 615.11}
[INFO|callbacks.py:310] 2024-07-16 09:42:41,790 >> {'loss': 0.0001, 'learning_rate': 6.0899e-09, 'epoch': 4.78, 'throughput': 615.12}
[INFO|callbacks.py:310] 2024-07-16 09:42:52,921 >> {'loss': 0.0004, 'learning_rate': 3.4262e-09, 'epoch': 4.81, 'throughput': 615.30}
[INFO|callbacks.py:310] 2024-07-16 09:43:04,012 >> {'loss': 0.0002, 'learning_rate': 1.5229e-09, 'epoch': 4.84, 'throughput': 615.28}
[INFO|callbacks.py:310] 2024-07-16 09:43:15,108 >> {'loss': 0.0000, 'learning_rate': 3.8076e-10, 'epoch': 4.86, 'throughput': 615.26}
[INFO|callbacks.py:310] 2024-07-16 09:43:26,201 >> {'loss': 0.0001, 'learning_rate': 0.0000e+00, 'epoch': 4.89, 'throughput': 615.25}
[INFO|trainer.py:3478] 2024-07-16 09:43:32,570 >> Saving model checkpoint to saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/checkpoint-190
[INFO|configuration_utils.py:472] 2024-07-16 09:43:32,573 >> Configuration saved in saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/checkpoint-190/config.json
[INFO|configuration_utils.py:769] 2024-07-16 09:43:32,573 >> Configuration saved in saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/checkpoint-190/generation_config.json
[INFO|modeling_utils.py:2698] 2024-07-16 09:43:46,233 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/checkpoint-190/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2574] 2024-07-16 09:43:46,233 >> tokenizer config file saved in saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/checkpoint-190/tokenizer_config.json
[INFO|tokenization_utils_base.py:2583] 2024-07-16 09:43:46,234 >> Special tokens file saved in saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/checkpoint-190/special_tokens_map.json
[INFO|trainer.py:2383] 2024-07-16 09:44:16,328 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
[INFO|trainer.py:3478] 2024-07-16 09:44:22,736 >> Saving model checkpoint to saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2
[INFO|configuration_utils.py:472] 2024-07-16 09:44:22,738 >> Configuration saved in saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/config.json
[INFO|configuration_utils.py:769] 2024-07-16 09:44:22,739 >> Configuration saved in saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/generation_config.json
[INFO|modeling_utils.py:2698] 2024-07-16 09:44:36,499 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/model.safetensors.index.json.
[INFO|tokenization_utils_base.py:2574] 2024-07-16 09:44:36,499 >> tokenizer config file saved in saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/tokenizer_config.json
[INFO|tokenization_utils_base.py:2583] 2024-07-16 09:44:36,499 >> Special tokens file saved in saves/LLaMA2-7B-Chat/full/train_2024-07-16-09-05-28_llama2/special_tokens_map.json
[WARNING|ploting.py:89] 2024-07-16 09:44:37,565 >> No metric eval_loss to plot.
[WARNING|ploting.py:89] 2024-07-16 09:44:37,565 >> No metric eval_accuracy to plot.
[INFO|modelcard.py:449] 2024-07-16 09:44:37,565 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}