2024-09-26 05:19:52,278 INFO MainThread:21647 [wandb_setup.py:_flush():77] Current SDK version is 0.18.1 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_setup.py:_flush():77] Configure stats pid to 21647 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_setup.py:_flush():77] Loading settings from /root/.config/wandb/settings 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_setup.py:_flush():77] Loading settings from /workspace/nanoT5/outputs/2024-09-26/05-19-51/wandb/settings 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_setup.py:_flush():77] Loading settings from environment variables: {'project': 'nanoT5'} 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_setup.py:_flush():77] Applying setup settings: {'mode': 'online', '_disable_service': None} 2024-09-26 05:19:52,279 WARNING MainThread:21647 [wandb_setup.py:_flush():77] Could not find program at -m nanoT5.main 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_setup.py:_flush():77] Inferring run settings from compute environment: {'program_relpath': None, 'program': '-m nanoT5.main'} 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_setup.py:_flush():77] Applying login settings: {} 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_init.py:_log_setup():532] Logging user logs to /workspace/nanoT5/outputs/2024-09-26/05-19-51/wandb/run-20240926_051952-6q92wn34/logs/debug.log 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_init.py:_log_setup():533] Logging internal logs to /workspace/nanoT5/outputs/2024-09-26/05-19-51/wandb/run-20240926_051952-6q92wn34/logs/debug-internal.log 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_init.py:init():616] calling init triggers 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_init.py:init():623] wandb.init called with sweep_config: {} config: {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 34534, 'model': {'klass': 'hf_t5', 'name': 'pszemraj/tFINE-850m-24x24-512ctx', 'overwrite': {'dropout_rate': 0.0}, 'checkpoint_path': '', 'random_init': False, 'compile': True}, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 16}, 'optim': {'name': 'adamwscale', 'base_lr': 0.01, 'batch_size': 128, 'total_steps': 20000, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.0, 'grad_clip': 1.0, 'grad_acc': 8, 'final_cosine': 2e-05}, 'eval': {'every_steps': 1000000000, 'steps': 500}, 'checkpoint': {'every_steps': 2500}, 'logging': {'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'pszemraj', 'tags': ['24x24', '1024'], 'mode': 'online'}, 'every_steps': 25, 'grad_l2': True, 'weights_l2': True}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/outputs/2024-09-26/05-19-51'} 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_init.py:init():666] starting backend 2024-09-26 05:19:52,279 INFO MainThread:21647 [wandb_init.py:init():670] setting up manager 2024-09-26 05:19:52,285 INFO MainThread:21647 [backend.py:_multiprocessing_setup():105] multiprocessing start_methods=fork,spawn,forkserver, using: spawn 2024-09-26 05:19:52,286 INFO MainThread:21647 [wandb_init.py:init():678] backend started and connected 2024-09-26 05:19:52,293 INFO MainThread:21647 [wandb_init.py:init():773] updated telemetry 2024-09-26 05:19:52,299 INFO MainThread:21647 [wandb_init.py:init():806] communicating run to backend with 90.0 second timeout 2024-09-26 05:19:52,503 INFO MainThread:21647 [wandb_init.py:init():857] starting run threads in backend 2024-09-26 05:19:52,643 INFO MainThread:21647 [wandb_run.py:_console_start():2459] atexit reg 2024-09-26 05:19:52,643 INFO MainThread:21647 [wandb_run.py:_redirect():2307] redirect: wrap_raw 2024-09-26 05:19:52,643 INFO MainThread:21647 [wandb_run.py:_redirect():2372] Wrapping output streams. 2024-09-26 05:19:52,643 INFO MainThread:21647 [wandb_run.py:_redirect():2397] Redirects installed. 2024-09-26 05:19:52,646 INFO MainThread:21647 [wandb_init.py:init():900] run started, returning control to user process 2024-09-26 05:20:47,956 INFO MainThread:21647 [wandb_run.py:_config_callback():1388] config_cb None None {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 34534, 'model': {'klass': 'hf_t5', 'name': 'pszemraj/tFINE-850m-24x24-512ctx', 'overwrite': {'dropout_rate': 0.0}, 'checkpoint_path': '', 'random_init': False, 'compile': True}, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 16, 'before_mask_input_length': 1137, 'target_length': 229}, 'optim': {'name': 'adamwscale', 'base_lr': 0.01, 'batch_size': 128, 'total_steps': 20000, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.0, 'grad_clip': 1.0, 'grad_acc': 8, 'final_cosine': 2e-05}, 'eval': {'every_steps': 1000000000, 'steps': 500, 'corrected_steps': 500}, 'checkpoint': {'every_steps': 2500}, 'logging': {'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'pszemraj', 'tags': ['24x24', '1024'], 'mode': 'online'}, 'every_steps': 25, 'grad_l2': True, 'weights_l2': True}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/outputs/2024-09-26/05-19-51', 'n_all_param': 853929472} 2024-09-27 17:10:18,389 INFO MainThread:21647 [wandb_run.py:_finish():2158] finishing run pszemraj/nanoT5/6q92wn34 2024-09-27 17:10:18,392 INFO MainThread:21647 [wandb_run.py:_atexit_cleanup():2422] got exitcode: 0 2024-09-27 17:10:18,392 INFO MainThread:21647 [wandb_run.py:_restore():2404] restore 2024-09-27 17:10:18,393 INFO MainThread:21647 [wandb_run.py:_restore():2410] restore done 2024-09-27 17:10:20,554 INFO MainThread:21647 [wandb_run.py:_footer_history_summary_info():4037] rendering history 2024-09-27 17:10:20,556 INFO MainThread:21647 [wandb_run.py:_footer_history_summary_info():4069] rendering summary 2024-09-27 17:10:20,567 INFO MainThread:21647 [wandb_run.py:_footer_sync_info():3996] logging synced files