ul3-base / wandb /debug.log
amazingvince's picture
Upload folder using huggingface_hub
ca671b7 verified
2024-10-20 18:25:18,064 INFO MainThread:4102 [wandb_setup.py:_flush():79] Current SDK version is 0.18.5
2024-10-20 18:25:18,064 INFO MainThread:4102 [wandb_setup.py:_flush():79] Configure stats pid to 4102
2024-10-20 18:25:18,065 INFO MainThread:4102 [wandb_setup.py:_flush():79] Loading settings from /root/.config/wandb/settings
2024-10-20 18:25:18,065 INFO MainThread:4102 [wandb_setup.py:_flush():79] Loading settings from /workspace/nanoT5/logs/2024-10-20/18-25-17/wandb/settings
2024-10-20 18:25:18,066 INFO MainThread:4102 [wandb_setup.py:_flush():79] Loading settings from environment variables: {}
2024-10-20 18:25:18,066 INFO MainThread:4102 [wandb_setup.py:_flush():79] Applying setup settings: {'mode': 'online', '_disable_service': None}
2024-10-20 18:25:18,067 WARNING MainThread:4102 [wandb_setup.py:_flush():79] Could not find program at -m nanoT5.main
2024-10-20 18:25:18,067 INFO MainThread:4102 [wandb_setup.py:_flush():79] Inferring run settings from compute environment: {'program_relpath': None, 'program': '-m nanoT5.main'}
2024-10-20 18:25:18,068 INFO MainThread:4102 [wandb_setup.py:_flush():79] Applying login settings: {}
2024-10-20 18:25:18,069 INFO MainThread:4102 [wandb_init.py:_log_setup():534] Logging user logs to /workspace/nanoT5/logs/2024-10-20/18-25-17/wandb/run-20241020_182518-i0qk9v3k/logs/debug.log
2024-10-20 18:25:18,071 INFO MainThread:4102 [wandb_init.py:_log_setup():535] Logging internal logs to /workspace/nanoT5/logs/2024-10-20/18-25-17/wandb/run-20241020_182518-i0qk9v3k/logs/debug-internal.log
2024-10-20 18:25:18,071 INFO MainThread:4102 [wandb_init.py:init():621] calling init triggers
2024-10-20 18:25:18,072 INFO MainThread:4102 [wandb_init.py:init():628] wandb.init called with sweep_config: {}
config: {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 93789, 'tokenizer': {'name': 'BEE-spoke-data/hf_slimpajama-6B-28672-BPE-forT5'}, 'working_dir': '/workspace/nanoT5/logs/2024-10-20/18-25-17', 'model': {'liger': True, 'klass': 'local_t5', 'name': 'pszemraj/tFINE-850m-24x24-1024ctx', 'overwrite': {'dropout_rate': 0.0, 'num_decoder_layers': 16, 'num_key_value_heads': 4, 'num_layers': 16, 'use_gqa': True}, 'add_config': {'is_bf16': True}, 'checkpoint_path': '', 'random_init': True, 'compile': True}, 'data': {'multi_task': True, 'NTP': 0.3, 'input_length': 512, 'max_seq_len': 512, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 0}, 'optim': {'name': 'adamwscale', 'base_lr': 0.001, 'batch_size': 128, 'total_steps': 65536, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.01, 'grad_clip': 1.0, 'grad_acc': 16, 'final_cosine': 2e-05}, 'eval': {'every_steps': 500, 'steps': 0}, 'checkpoint': {'every_steps': 1500}, 'logging': {'every_steps': 25, 'grad_l2': True, 'weights_l2': True, 'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'amazingvince', 'tags': ['gqa', 'large', 'e32-d16', '512 ctx'], 'mode': 'online'}}, 'slurm_id': 'none'}
2024-10-20 18:25:18,073 INFO MainThread:4102 [wandb_init.py:init():671] starting backend
2024-10-20 18:25:18,074 INFO MainThread:4102 [wandb_init.py:init():675] sending inform_init request
2024-10-20 18:25:18,121 INFO MainThread:4102 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2024-10-20 18:25:18,122 INFO MainThread:4102 [wandb_init.py:init():688] backend started and connected
2024-10-20 18:25:18,198 INFO MainThread:4102 [wandb_init.py:init():783] updated telemetry
2024-10-20 18:25:18,256 INFO MainThread:4102 [wandb_init.py:init():816] communicating run to backend with 90.0 second timeout
2024-10-20 18:25:19,558 INFO MainThread:4102 [wandb_init.py:init():867] starting run threads in backend
2024-10-20 18:25:19,755 INFO MainThread:4102 [wandb_run.py:_console_start():2463] atexit reg
2024-10-20 18:25:19,756 INFO MainThread:4102 [wandb_run.py:_redirect():2311] redirect: wrap_raw
2024-10-20 18:25:19,757 INFO MainThread:4102 [wandb_run.py:_redirect():2376] Wrapping output streams.
2024-10-20 18:25:19,759 INFO MainThread:4102 [wandb_run.py:_redirect():2401] Redirects installed.
2024-10-20 18:25:19,763 INFO MainThread:4102 [wandb_init.py:init():911] run started, returning control to user process
2024-10-20 18:25:41,763 INFO MainThread:4102 [wandb_run.py:_config_callback():1390] config_cb None None {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 93789, 'tokenizer': {'name': 'BEE-spoke-data/hf_slimpajama-6B-28672-BPE-forT5'}, 'working_dir': '/workspace/nanoT5/logs/2024-10-20/18-25-17', 'model': {'liger': True, 'klass': 'local_t5', 'name': 'pszemraj/tFINE-850m-24x24-1024ctx', 'overwrite': {'dropout_rate': 0.0, 'num_decoder_layers': 16, 'num_key_value_heads': 4, 'num_layers': 16, 'use_gqa': True}, 'add_config': {'is_bf16': True}, 'checkpoint_path': '', 'random_init': True, 'compile': True}, 'data': {'multi_task': True, 'NTP': 0.3, 'input_length': 512, 'max_seq_len': 512, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 0, 'before_mask_input_length': 568, 'target_length': 114}, 'optim': {'name': 'adamwscale', 'base_lr': 0.001, 'batch_size': 128, 'total_steps': 65536, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.01, 'grad_clip': 1.0, 'grad_acc': 16, 'final_cosine': 2e-05}, 'eval': {'every_steps': 500, 'steps': 0, 'corrected_steps': 0}, 'checkpoint': {'every_steps': 1500}, 'logging': {'every_steps': 25, 'grad_l2': True, 'weights_l2': True, 'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'amazingvince', 'tags': ['gqa', 'large', 'e32-d16', '512 ctx'], 'mode': 'online'}}, 'slurm_id': 'none', 'n_all_param': 486886912}
2024-10-24 02:27:45,254 WARNING MsgRouterThr:4102 [router.py:message_loop():77] message_loop has been closed