OPT-1.3b-Chat / training.log
DarwinAnim8or's picture
Upload 4 files
82796c4
raw
history blame contribute delete
No virus
161 kB
[2023-09-03 18:33:54,471] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-09-03 18:33:59.289918: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-09-03 18:34:00,089] [WARNING] [runner.py:203:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-09-03 18:34:00,089] [INFO] [runner.py:570:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --model_name_or_path facebook/opt-1.3b --gradient_accumulation_steps 8 --gradient_checkpointing --lora_dim 128 --zero_stage 0 --enable_tensorboard --tensorboard_path /content/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b --deepspeed --output_dir /content/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b
[2023-09-03 18:34:02,103] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2023-09-03 18:34:06.217869: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-09-03 18:34:07,000] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.15.5-1+cuda11.8
[2023-09-03 18:34:07,000] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.15.5-1
[2023-09-03 18:34:07,000] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.15.5-1
[2023-09-03 18:34:07,000] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2023-09-03 18:34:07,000] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.15.5-1+cuda11.8
[2023-09-03 18:34:07,000] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2023-09-03 18:34:07,000] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.15.5-1
[2023-09-03 18:34:07,000] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]}
[2023-09-03 18:34:07,000] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-09-03 18:34:07,000] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-09-03 18:34:07,000] [INFO] [launch.py:163:main] dist_world_size=1
[2023-09-03 18:34:07,000] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0
2023-09-03 18:34:10.336810: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2023-09-03 18:34:12,340] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-09-03 18:34:16,742] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-09-03 18:34:16,742] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
Downloading (…)okenizer_config.json: 0%| | 0.00/685 [00:00<?, ?B/s] Downloading (…)okenizer_config.json: 100%|██████████| 685/685 [00:00<00:00, 2.96MB/s]
Downloading (…)lve/main/config.json: 0%| | 0.00/653 [00:00<?, ?B/s] Downloading (…)lve/main/config.json: 100%|██████████| 653/653 [00:00<00:00, 3.88MB/s]
Downloading (…)olve/main/vocab.json: 0%| | 0.00/899k [00:00<?, ?B/s] Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 3.65MB/s] Downloading (…)olve/main/vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 3.64MB/s]
Downloading (…)olve/main/merges.txt: 0%| | 0.00/456k [00:00<?, ?B/s] Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 38.7MB/s]
Downloading (…)cial_tokens_map.json: 0%| | 0.00/441 [00:00<?, ?B/s] Downloading (…)cial_tokens_map.json: 100%|██████████| 441/441 [00:00<00:00, 2.67MB/s]
Downloading pytorch_model.bin: 0%| | 0.00/2.63G [00:00<?, ?B/s] Downloading pytorch_model.bin: 2%|▏ | 41.9M/2.63G [00:00<00:07, 361MB/s] Downloading pytorch_model.bin: 4%|▎ | 94.4M/2.63G [00:00<00:05, 439MB/s] Downloading pytorch_model.bin: 6%|▌ | 147M/2.63G [00:00<00:05, 468MB/s] Downloading pytorch_model.bin: 8%|▊ | 199M/2.63G [00:00<00:05, 482MB/s] Downloading pytorch_model.bin: 10%|▉ | 252M/2.63G [00:00<00:04, 483MB/s] Downloading pytorch_model.bin: 12%|█▏ | 304M/2.63G [00:00<00:04, 492MB/s] Downloading pytorch_model.bin: 14%|█▎ | 357M/2.63G [00:00<00:04, 494MB/s] Downloading pytorch_model.bin: 16%|█▌ | 409M/2.63G [00:00<00:04, 496MB/s] Downloading pytorch_model.bin: 18%|█▊ | 461M/2.63G [00:00<00:04, 498MB/s] Downloading pytorch_model.bin: 20%|█▉ | 514M/2.63G [00:01<00:04, 499MB/s] Downloading pytorch_model.bin: 22%|██▏ | 566M/2.63G [00:01<00:04, 492MB/s] Downloading pytorch_model.bin: 24%|██▎ | 619M/2.63G [00:01<00:04, 494MB/s] Downloading pytorch_model.bin: 26%|██▌ | 671M/2.63G [00:01<00:03, 493MB/s] Downloading pytorch_model.bin: 27%|██▋ | 724M/2.63G [00:01<00:03, 492MB/s] Downloading pytorch_model.bin: 29%|██▉ | 776M/2.63G [00:01<00:03, 491MB/s] Downloading pytorch_model.bin: 31%|███▏ | 828M/2.63G [00:01<00:03, 492MB/s] Downloading pytorch_model.bin: 33%|███▎ | 881M/2.63G [00:01<00:03, 483MB/s] Downloading pytorch_model.bin: 35%|███▌ | 933M/2.63G [00:01<00:03, 475MB/s] Downloading pytorch_model.bin: 37%|███▋ | 986M/2.63G [00:02<00:03, 475MB/s] Downloading pytorch_model.bin: 39%|███▉ | 1.04G/2.63G [00:02<00:03, 476MB/s] Downloading pytorch_model.bin: 41%|████▏ | 1.09G/2.63G [00:02<00:03, 473MB/s] Downloading pytorch_model.bin: 43%|████▎ | 1.14G/2.63G [00:02<00:03, 471MB/s] Downloading pytorch_model.bin: 45%|████▌ | 1.20G/2.63G [00:02<00:03, 455MB/s] Downloading pytorch_model.bin: 47%|████▋ | 1.25G/2.63G [00:02<00:03, 410MB/s] Downloading pytorch_model.bin: 49%|████▉ | 1.30G/2.63G [00:02<00:03, 388MB/s] Downloading pytorch_model.bin: 51%|█████▏ | 1.35G/2.63G [00:02<00:03, 394MB/s] Downloading pytorch_model.bin: 53%|█████▎ | 1.41G/2.63G [00:03<00:03, 407MB/s] Downloading pytorch_model.bin: 55%|█████▌ | 1.46G/2.63G [00:03<00:02, 415MB/s] Downloading pytorch_model.bin: 57%|█████▋ | 1.51G/2.63G [00:03<00:02, 414MB/s] Downloading pytorch_model.bin: 59%|█████▉ | 1.55G/2.63G [00:03<00:02, 409MB/s] Downloading pytorch_model.bin: 61%|██████ | 1.59G/2.63G [00:03<00:02, 409MB/s] Downloading pytorch_model.bin: 63%|██████▎ | 1.65G/2.63G [00:03<00:02, 418MB/s] Downloading pytorch_model.bin: 65%|██████▍ | 1.70G/2.63G [00:03<00:02, 433MB/s] Downloading pytorch_model.bin: 67%|██████▋ | 1.75G/2.63G [00:03<00:01, 451MB/s] Downloading pytorch_model.bin: 69%|██████▊ | 1.80G/2.63G [00:03<00:01, 465MB/s] Downloading pytorch_model.bin: 71%|███████ | 1.86G/2.63G [00:04<00:02, 352MB/s] Downloading pytorch_model.bin: 72%|███████▏ | 1.90G/2.63G [00:04<00:02, 349MB/s] Downloading pytorch_model.bin: 74%|███████▎ | 1.94G/2.63G [00:04<00:02, 316MB/s] Downloading pytorch_model.bin: 75%|███████▌ | 1.98G/2.63G [00:04<00:02, 293MB/s] Downloading pytorch_model.bin: 77%|███████▋ | 2.01G/2.63G [00:04<00:02, 284MB/s] Downloading pytorch_model.bin: 78%|███████▊ | 2.04G/2.63G [00:04<00:02, 265MB/s] Downloading pytorch_model.bin: 79%|███████▉ | 2.08G/2.63G [00:05<00:02, 252MB/s] Downloading pytorch_model.bin: 80%|████████ | 2.11G/2.63G [00:05<00:02, 238MB/s] Downloading pytorch_model.bin: 81%|████████▏ | 2.14G/2.63G [00:05<00:02, 222MB/s] Downloading pytorch_model.bin: 82%|████████▏ | 2.17G/2.63G [00:05<00:02, 208MB/s] Downloading pytorch_model.bin: 84%|████████▎ | 2.20G/2.63G [00:05<00:02, 189MB/s] Downloading pytorch_model.bin: 84%|████████▍ | 2.22G/2.63G [00:05<00:02, 178MB/s] Downloading pytorch_model.bin: 85%|████████▌ | 2.24G/2.63G [00:06<00:02, 172MB/s] Downloading pytorch_model.bin: 86%|████████▌ | 2.26G/2.63G [00:06<00:02, 168MB/s] Downloading pytorch_model.bin: 87%|████████▋ | 2.29G/2.63G [00:06<00:02, 166MB/s] Downloading pytorch_model.bin: 88%|████████▊ | 2.31G/2.63G [00:06<00:01, 167MB/s] Downloading pytorch_model.bin: 88%|████████▊ | 2.33G/2.63G [00:06<00:01, 170MB/s] Downloading pytorch_model.bin: 90%|████████▉ | 2.36G/2.63G [00:06<00:01, 189MB/s] Downloading pytorch_model.bin: 91%|█████████ | 2.39G/2.63G [00:06<00:01, 201MB/s] Downloading pytorch_model.bin: 92%|█████████▏| 2.42G/2.63G [00:06<00:00, 213MB/s] Downloading pytorch_model.bin: 93%|█████████▎| 2.45G/2.63G [00:07<00:00, 227MB/s] Downloading pytorch_model.bin: 94%|█████████▍| 2.49G/2.63G [00:07<00:00, 245MB/s] Downloading pytorch_model.bin: 96%|█████████▌| 2.53G/2.63G [00:07<00:00, 266MB/s] Downloading pytorch_model.bin: 98%|█████████▊| 2.57G/2.63G [00:07<00:00, 298MB/s] Downloading pytorch_model.bin: 99%|█████████▉| 2.60G/2.63G [00:07<00:00, 297MB/s] Downloading pytorch_model.bin: 100%|██████████| 2.63G/2.63G [00:07<00:00, 345MB/s]
Downloading (…)neration_config.json: 0%| | 0.00/137 [00:00<?, ?B/s] Downloading (…)neration_config.json: 100%|██████████| 137/137 [00:00<00:00, 676kB/s]
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embeding dimension will be 50272. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
Downloading readme: 0%| | 0.00/530 [00:00<?, ?B/s] Downloading readme: 100%|██████████| 530/530 [00:00<00:00, 3.84MB/s]
Downloading metadata: 0%| | 0.00/926 [00:00<?, ?B/s] Downloading metadata: 100%|██████████| 926/926 [00:00<00:00, 6.44MB/s]
Downloading data files: 0%| | 0/2 [00:00<?, ?it/s]
Downloading data: 0%| | 0.00/68.4M [00:00<?, ?B/s]
Downloading data: 6%|▌ | 4.19M/68.4M [00:00<00:04, 13.8MB/s]
Downloading data: 18%|█▊ | 12.6M/68.4M [00:00<00:02, 21.8MB/s]
Downloading data: 31%|███ | 21.0M/68.4M [00:00<00:02, 23.1MB/s]
Downloading data: 43%|████▎ | 29.4M/68.4M [00:01<00:01, 25.0MB/s]
Downloading data: 55%|█████▌ | 37.7M/68.4M [00:01<00:01, 26.3MB/s]
Downloading data: 67%|██████▋ | 46.1M/68.4M [00:01<00:00, 27.0MB/s]
Downloading data: 80%|███████▉ | 54.5M/68.4M [00:02<00:00, 27.5MB/s]
Downloading data: 92%|█████████▏| 62.9M/68.4M [00:02<00:00, 27.8MB/s]
Downloading data: 100%|██████████| 68.4M/68.4M [00:02<00:00, 25.6MB/s] Downloading data: 100%|██████████| 68.4M/68.4M [00:02<00:00, 25.0MB/s]
Downloading data files: 50%|█████ | 1/2 [00:02<00:02, 2.74s/it]
Downloading data: 0%| | 0.00/4.61M [00:00<?, ?B/s]
Downloading data: 91%|█████████ | 4.19M/4.61M [00:00<00:00, 14.8MB/s] Downloading data: 100%|██████████| 4.61M/4.61M [00:00<00:00, 16.2MB/s]
Downloading data files: 100%|██████████| 2/2 [00:03<00:00, 1.30s/it] Downloading data files: 100%|██████████| 2/2 [00:03<00:00, 1.51s/it]
Extracting data files: 0%| | 0/2 [00:00<?, ?it/s] Extracting data files: 100%|██████████| 2/2 [00:00<00:00, 1938.22it/s]
Generating train split: 0%| | 0/76256 [00:00<?, ? examples/s] Generating train split: 39%|███▉ | 30000/76256 [00:00<00:00, 258350.02 examples/s] Generating train split: 92%|█████████▏| 70000/76256 [00:00<00:00, 311853.17 examples/s] Generating train split: 100%|██████████| 76256/76256 [00:00<00:00, 304742.38 examples/s]
Generating test split: 0%| | 0/5103 [00:00<?, ? examples/s] Generating test split: 100%|██████████| 5103/5103 [00:00<00:00, 333939.73 examples/s]
Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py310_cu118/fused_adam...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -std=c++17 -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
[3/3] c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/usr/local/lib/python3.10/dist-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o fused_adam.so
Loading extension module fused_adam...
Time to load fused_adam op: 31.66241216659546 seconds
[2023-09-03 18:35:52,246] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.10.2, git-hash=unknown, git-branch=unknown
[2023-09-03 18:35:52,246] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
[2023-09-03 18:35:53,727] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-09-03 18:35:53,729] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer
[2023-09-03 18:35:53,729] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-09-03 18:35:53,779] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2023-09-03 18:35:53,779] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 optimizer with dynamic loss scale
[2023-09-03 18:35:53,858] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam
[2023-09-03 18:35:53,859] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-09-03 18:35:53,859] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x793c88ee6980>
[2023-09-03 18:35:53,859] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.001, 0.0005, 0.001], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:35:53,860] [INFO] [config.py:963:print] DeepSpeedEngine configuration:
[2023-09-03 18:35:53,860] [INFO] [config.py:967:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2023-09-03 18:35:53,860] [INFO] [config.py:967:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-09-03 18:35:53,860] [INFO] [config.py:967:print] amp_enabled .................. False
[2023-09-03 18:35:53,860] [INFO] [config.py:967:print] amp_params ................... False
[2023-09-03 18:35:53,862] [INFO] [config.py:967:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2023-09-03 18:35:53,862] [INFO] [config.py:967:print] bfloat16_enabled ............. False
[2023-09-03 18:35:53,862] [INFO] [config.py:967:print] checkpoint_parallel_write_pipeline False
[2023-09-03 18:35:53,862] [INFO] [config.py:967:print] checkpoint_tag_validation_enabled True
[2023-09-03 18:35:53,862] [INFO] [config.py:967:print] checkpoint_tag_validation_fail False
[2023-09-03 18:35:53,862] [INFO] [config.py:967:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x793bea59ceb0>
[2023-09-03 18:35:53,862] [INFO] [config.py:967:print] communication_data_type ...... None
[2023-09-03 18:35:53,862] [INFO] [config.py:967:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] curriculum_enabled_legacy .... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] curriculum_params_legacy ..... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] data_efficiency_enabled ...... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] dataloader_drop_last ......... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] disable_allgather ............ False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] dump_state ................... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] eigenvalue_enabled ........... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] eigenvalue_gas_boundary_resolution 1
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] eigenvalue_layer_name ........ bert.encoder.layer
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] eigenvalue_layer_num ......... 0
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] eigenvalue_max_iter .......... 100
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] eigenvalue_stability ......... 1e-06
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] eigenvalue_tol ............... 0.01
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] eigenvalue_verbose ........... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] elasticity_enabled ........... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] fp16_auto_cast ............... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] fp16_enabled ................. True
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] fp16_master_weights_and_gradients False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] global_rank .................. 0
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] grad_accum_dtype ............. None
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] gradient_accumulation_steps .. 8
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] gradient_clipping ............ 1.0
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] gradient_predivide_factor .... 1.0
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] initial_dynamic_scale ........ 65536
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] load_universal_checkpoint .... False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] loss_scale ................... 0
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] memory_breakdown ............. False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] mics_hierarchial_params_gather False
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] mics_shard_size .............. -1
[2023-09-03 18:35:53,863] [INFO] [config.py:967:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=True, output_path='/content/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b/ds_tensorboard_logs/', job_name='step1_model_tensorboard') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=True
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] optimizer_legacy_fusion ...... False
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] optimizer_name ............... None
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] optimizer_params ............. None
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] pld_enabled .................. False
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] pld_params ................... False
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] prescale_gradients ........... False
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] scheduler_name ............... None
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] scheduler_params ............. None
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] sparse_attention ............. None
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] sparse_gradients_enabled ..... False
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] steps_per_print .............. 10
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] train_batch_size ............. 128
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] train_micro_batch_size_per_gpu 16
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] use_node_local_storage ....... False
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] wall_clock_breakdown ......... False
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] world_size ................... 1
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] zero_allow_untested_optimizer False
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False pipeline_loading_checkpoint=False override_module_apply=True
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] zero_enabled ................. False
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] zero_force_ds_cpu_optimizer .. True
[2023-09-03 18:35:53,864] [INFO] [config.py:967:print] zero_optimization_stage ...... 0
[2023-09-03 18:35:53,864] [INFO] [config.py:953:print_user_config] json = {
"train_batch_size": 128,
"train_micro_batch_size_per_gpu": 16,
"steps_per_print": 10,
"zero_optimization": {
"stage": 0,
"offload_param": {
"device": "none"
},
"offload_optimizer": {
"device": "none"
},
"stage3_param_persistence_threshold": 1.000000e+04,
"stage3_max_live_parameters": 3.000000e+07,
"stage3_prefetch_bucket_size": 3.000000e+07,
"memory_efficient_linear": false
},
"fp16": {
"enabled": true,
"loss_scale_window": 100
},
"gradient_clipping": 1.0,
"prescale_gradients": false,
"wall_clock_breakdown": false,
"hybrid_engine": {
"enabled": false,
"max_out_tokens": 512,
"inference_tp_size": 1,
"release_inference_cache": false,
"pin_parameters": true,
"tp_gather_partition_size": 8
},
"tensorboard": {
"enabled": true,
"output_path": "/content/DeepSpeedExamples/applications/DeepSpeed-Chat/output/actor-models/1.3b/ds_tensorboard_logs/",
"job_name": "step1_model_tensorboard"
}
}
***** Running training *****
***** Evaluating perplexity, Epoch 0/1 *****
ppl: 4392.14599609375
Beginning of Epoch 1/1, Total Micro Batches 954
Model Parameters: 1.429 B, Latency: 1.02s, TFLOPs: 66.78, Samples/sec: 15.63, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.22, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.26, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.21, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:36:16,461] [INFO] [fused_optimizer.py:347:_update_scale]
Grad overflow on iteration 0
[2023-09-03 18:36:16,461] [INFO] [fused_optimizer.py:348:_update_scale] Reducing dynamic loss scale from 65536 to 32768.0
[2023-09-03 18:36:16,461] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 65536, reducing to 32768.0
Model Parameters: 1.429 B, Latency: 0.86s, TFLOPs: 79.81, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.44, Samples/sec: 19.06, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.26, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.24, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.26, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.30, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.25, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:36:23,228] [INFO] [fused_optimizer.py:347:_update_scale]
Grad overflow on iteration 1
[2023-09-03 18:36:23,228] [INFO] [fused_optimizer.py:348:_update_scale] Reducing dynamic loss scale from 32768.0 to 16384.0
[2023-09-03 18:36:23,228] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0
Model Parameters: 1.429 B, Latency: 0.86s, TFLOPs: 79.83, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.40, Samples/sec: 19.05, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.18, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.72, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.79, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.31, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.20, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:36:30,017] [INFO] [fused_optimizer.py:347:_update_scale]
Grad overflow on iteration 2
[2023-09-03 18:36:30,017] [INFO] [fused_optimizer.py:348:_update_scale] Reducing dynamic loss scale from 16384.0 to 8192.0
[2023-09-03 18:36:30,017] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0
Model Parameters: 1.429 B, Latency: 0.86s, TFLOPs: 79.81, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.77, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.14, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.14, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.95s, TFLOPs: 71.82, Samples/sec: 16.81, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.40, Samples/sec: 19.05, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.19, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.37, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.15, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.18, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:36:50,496] [INFO] [fused_optimizer.py:347:_update_scale]
Grad overflow on iteration 5
[2023-09-03 18:36:50,496] [INFO] [fused_optimizer.py:348:_update_scale] Reducing dynamic loss scale from 8192.0 to 4096.0
[2023-09-03 18:36:50,496] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0
Model Parameters: 1.429 B, Latency: 0.86s, TFLOPs: 79.60, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.70, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.13, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.91s, TFLOPs: 75.52, Samples/sec: 17.67, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.25, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.38, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.14, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.18, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.15, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.37, Samples/sec: 17.87, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.40, Samples/sec: 19.05, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.79, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.14, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.28, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:37:17,791] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=4, lr=[0.0009938441702975688, 0.0004969220851487844, 0.0009938441702975688], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:37:17,794] [INFO] [timer.py:260:stop] epoch=0/micro_step=80/global_step=10, RunningAvgSamplesPerSec=18.831049909747506, CurrSamplesPerSec=18.807653213345517, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.19, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.45, Samples/sec: 19.06, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.13, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.13, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.14, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.20, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.24, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.95, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.24, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.16, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.29, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.80, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.24, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.28, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.17, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.26, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.92, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.19, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.22, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.52, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.88, Samples/sec: 18.92, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.58, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.15, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.20, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.91, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.23, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.18, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.23, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.60, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:38:26,033] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=4, lr=[0.0009567727288213005, 0.0004783863644106502, 0.0009567727288213005], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:38:26,036] [INFO] [timer.py:260:stop] epoch=0/micro_step=160/global_step=20, RunningAvgSamplesPerSec=18.825310270943078, CurrSamplesPerSec=18.81212272756149, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.22, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.83, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.16, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.20, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.12, Samples/sec: 17.81, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.33, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.20, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.89, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.79, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.80, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.27, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.93, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.89, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.14, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.23, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.26, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.45, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.19, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.36, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.30, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.29, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:39:34,275] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=4, lr=[0.0008885729807284854, 0.0004442864903642427, 0.0008885729807284854], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:39:34,278] [INFO] [timer.py:260:stop] epoch=0/micro_step=240/global_step=30, RunningAvgSamplesPerSec=18.82351155926243, CurrSamplesPerSec=18.834208831789546, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.24, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.56, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.24, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.37, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.25, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.31, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.90, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.30, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.51, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.27, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.27, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.37, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.13, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.31, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.40, Samples/sec: 19.05, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.84, Samples/sec: 18.92, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.27, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.18, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.32, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.30, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.30, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.85, Samples/sec: 18.92, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:40:42,504] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=4, lr=[0.0007938926261462366, 0.0003969463130731183, 0.0007938926261462366], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:40:42,507] [INFO] [timer.py:260:stop] epoch=0/micro_step=320/global_step=40, RunningAvgSamplesPerSec=18.823593319102685, CurrSamplesPerSec=18.832310081437228, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.28, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.94, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.94, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.29, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.22, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.32, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.32, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.61, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.94, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.20, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.89, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.34, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.30, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.16, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.81, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.13, Samples/sec: 17.81, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.29, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.30, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.18, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.22, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.35, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.36, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:41:50,746] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=4, lr=[0.0006791839747726501, 0.00033959198738632503, 0.0006791839747726501], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:41:50,749] [INFO] [timer.py:260:stop] epoch=0/micro_step=400/global_step=50, RunningAvgSamplesPerSec=18.823006948587473, CurrSamplesPerSec=18.82600414243624, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.11, Samples/sec: 17.81, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.54, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.95, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.07, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 75.80, Samples/sec: 17.74, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.28, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.22, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.28, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.28, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.00, Samples/sec: 17.78, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.54, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.23, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.14, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.21, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.20, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.26, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.22, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.21, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.18, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.30, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.20, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:42:59,018] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=4, lr=[0.0005522642316338268, 0.0002761321158169134, 0.0005522642316338268], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:42:59,021] [INFO] [timer.py:260:stop] epoch=0/micro_step=480/global_step=60, RunningAvgSamplesPerSec=18.821109620393397, CurrSamplesPerSec=18.832281675761887, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.19, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.70, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.15, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.17, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.14, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.08, Samples/sec: 17.80, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.15, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.95, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.32, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.26, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.55, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.66, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.26, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.25, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.94, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.93, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.22, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.39, Samples/sec: 19.05, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.09, Samples/sec: 17.80, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.23, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.74, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.46, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.19, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.26, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.94, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.23, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.92, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:44:07,267] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=4, lr=[0.0004217827674798845, 0.00021089138373994224, 0.0004217827674798845], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:44:07,270] [INFO] [timer.py:260:stop] epoch=0/micro_step=560/global_step=70, RunningAvgSamplesPerSec=18.820836318425595, CurrSamplesPerSec=18.832129739883644, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.26, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.34, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.15, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.46, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.15, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.27, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.33, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.27, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.92, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.09, Samples/sec: 17.81, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.37, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.82, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.71, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 75.64, Samples/sec: 17.70, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.21, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.92, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.20, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.13, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.27, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 75.68, Samples/sec: 17.71, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.31, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.26, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:45:15,513] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=4, lr=[0.0002966316784621, 0.00014831583923105, 0.0002966316784621], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:45:15,516] [INFO] [timer.py:260:stop] epoch=0/micro_step=640/global_step=80, RunningAvgSamplesPerSec=18.82067254206399, CurrSamplesPerSec=18.820728995880213, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.23, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.37, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.32, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.28, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.17, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.34, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.23, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.33, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.29, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.34, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.13, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.13, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.29, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.24, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.33, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.24, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.39, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.14, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.09, Samples/sec: 17.80, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.30, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.87, Samples/sec: 18.92, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.05, Samples/sec: 17.80, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.29, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.83, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.17, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.33, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:46:23,729] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=4, lr=[0.00018533980447508135, 9.266990223754068e-05, 0.00018533980447508135], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:46:23,732] [INFO] [timer.py:260:stop] epoch=0/micro_step=720/global_step=90, RunningAvgSamplesPerSec=18.82138705032387, CurrSamplesPerSec=18.839549715629854, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.23, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.40, Samples/sec: 19.05, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.01, Samples/sec: 17.79, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.24, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 75.75, Samples/sec: 17.73, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.36, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.16, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.25, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.95, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.91, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.21, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.37, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.86s, TFLOPs: 79.75, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.20, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.30, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.21, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.34, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.24, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.25, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.54, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.91, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.45, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.31, Samples/sec: 17.86, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.19, Samples/sec: 19.00, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.13, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:47:31,975] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=4, lr=[9.549150281252633e-05, 4.7745751406263163e-05, 9.549150281252633e-05], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:47:31,978] [INFO] [timer.py:260:stop] epoch=0/micro_step=800/global_step=100, RunningAvgSamplesPerSec=18.82124037075098, CurrSamplesPerSec=18.82521066864151, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.17, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.26, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.31, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.12, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.79, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 75.92, Samples/sec: 17.77, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.32, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.07, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.93, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.95, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.17, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.36, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.28, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.30, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.20, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.28, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.33, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.10, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.19, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.33, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:48:19,732] [INFO] [fused_optimizer.py:355:_update_scale] No Grad overflow for 100 iterations
[2023-09-03 18:48:19,732] [INFO] [fused_optimizer.py:356:_update_scale] Increasing dynamic loss scale from 4096.0 to 8192.0
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.23, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.21, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.28, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.93, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.95, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.33, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 75.88, Samples/sec: 17.76, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.35, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.95, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.11, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
[2023-09-03 18:48:40,227] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=4, lr=[3.3209786751399184e-05, 1.6604893375699592e-05, 3.3209786751399184e-05], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2023-09-03 18:48:40,230] [INFO] [timer.py:260:stop] epoch=0/micro_step=880/global_step=110, RunningAvgSamplesPerSec=18.821008853104154, CurrSamplesPerSec=18.827828329578843, MemAllocated=5.92GB, MaxMemAllocated=11.24GB
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.16, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.29, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.95, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.96, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.15, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.36, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.17, Samples/sec: 17.82, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.23, Samples/sec: 19.01, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.05, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.57, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.26, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.34, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.91s, TFLOPs: 75.21, Samples/sec: 17.60, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.33, Samples/sec: 19.03, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.03, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.25, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.36, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.97, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.18, Samples/sec: 17.83, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.99, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.02, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.04, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.22, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.37, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.09, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.06, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.08, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.90s, TFLOPs: 76.25, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.27, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.91, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 80.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.01, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.07, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.00, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.85s, TFLOPs: 80.72, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.91s, TFLOPs: 75.41, Samples/sec: 17.64, Time/seq 0.06s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.84s, TFLOPs: 81.28, Samples/sec: 19.02, Time/seq 0.05s, Batch Size: 16, Sequence Length: 512
Model Parameters: 1.429 B, Latency: 0.25s, TFLOPs: 273.72, Samples/sec: 64.05, Time/seq 0.02s, Batch Size: 16, Sequence Length: 512
***** Evaluating perplexity, Epoch 1/1 *****
ppl: 2.149731159210205
saving the final model ...
[2023-09-03 18:50:03,981] [INFO] [launch.py:347:main] Process 2600 exits successfully.