File size: 6,399 Bytes
4dfcb10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b93537e
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
2024-08-30 19:59:24,178 INFO    MainThread:29052 [wandb_setup.py:_flush():77] Current SDK version is 0.17.8
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_setup.py:_flush():77] Configure stats pid to 29052
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_setup.py:_flush():77] Loading settings from /root/.config/wandb/settings
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_setup.py:_flush():77] Loading settings from /workspace/nanoT5/outputs/2024-08-30/19-59-22/wandb/settings
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_setup.py:_flush():77] Loading settings from environment variables: {}
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_setup.py:_flush():77] Applying setup settings: {'_disable_service': False}
2024-08-30 19:59:24,179 WARNING MainThread:29052 [wandb_setup.py:_flush():77] Could not find program at -m nanoT5.main
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_setup.py:_flush():77] Inferring run settings from compute environment: {'program_relpath': None, 'program': '-m nanoT5.main'}
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_setup.py:_flush():77] Applying login settings: {}
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_init.py:_log_setup():524] Logging user logs to /workspace/nanoT5/outputs/2024-08-30/19-59-22/wandb/run-20240830_195924-mao0tqjy/logs/debug.log
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_init.py:_log_setup():525] Logging internal logs to /workspace/nanoT5/outputs/2024-08-30/19-59-22/wandb/run-20240830_195924-mao0tqjy/logs/debug-internal.log
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_init.py:init():607] calling init triggers
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_init.py:init():614] wandb.init called with sweep_config: {}
config: {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 34534, 'model': {'klass': 'hf_t5', 'name': 'pszemraj/tFINE-900m-e16-d32', 'overwrite': {'dropout_rate': 0.0}, 'checkpoint_path': '', 'random_init': False, 'compile': True}, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 16}, 'optim': {'name': 'adamwscale', 'base_lr': 0.01, 'batch_size': 128, 'total_steps': 20000, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.0001, 'grad_clip': 1.0, 'grad_acc': 8, 'final_cosine': 2e-05}, 'eval': {'every_steps': 1000000000, 'steps': 500}, 'checkpoint': {'every_steps': 2500}, 'logging': {'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'pszemraj', 'tags': ['900m', '1024'], 'mode': 'online'}, 'every_steps': 25, 'grad_l2': True, 'weights_l2': True}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/outputs/2024-08-30/19-59-22'}
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_init.py:init():657] starting backend
2024-08-30 19:59:24,179 INFO    MainThread:29052 [wandb_init.py:init():661] setting up manager
2024-08-30 19:59:24,185 INFO    MainThread:29052 [backend.py:_multiprocessing_setup():105] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2024-08-30 19:59:24,187 INFO    MainThread:29052 [wandb_init.py:init():669] backend started and connected
2024-08-30 19:59:24,192 INFO    MainThread:29052 [wandb_init.py:init():767] updated telemetry
2024-08-30 19:59:24,198 INFO    MainThread:29052 [wandb_init.py:init():800] communicating run to backend with 90.0 second timeout
2024-08-30 19:59:24,583 INFO    MainThread:29052 [wandb_init.py:init():851] starting run threads in backend
2024-08-30 19:59:24,814 INFO    MainThread:29052 [wandb_run.py:_console_start():2463] atexit reg
2024-08-30 19:59:24,814 INFO    MainThread:29052 [wandb_run.py:_redirect():2309] redirect: wrap_raw
2024-08-30 19:59:24,814 INFO    MainThread:29052 [wandb_run.py:_redirect():2374] Wrapping output streams.
2024-08-30 19:59:24,815 INFO    MainThread:29052 [wandb_run.py:_redirect():2399] Redirects installed.
2024-08-30 19:59:24,818 INFO    MainThread:29052 [wandb_init.py:init():894] run started, returning control to user process
2024-08-30 19:59:44,796 INFO    MainThread:29052 [wandb_run.py:_config_callback():1392] config_cb None None {'mode': 'pt', 'device': 'gpu', 'precision': 'bf16', 'eval_only': False, 'predict_only': False, 'seed': 34534, 'model': {'klass': 'hf_t5', 'name': 'pszemraj/tFINE-900m-e16-d32', 'overwrite': {'dropout_rate': 0.0}, 'checkpoint_path': '', 'random_init': False, 'compile': True}, 'tokenizer': {'name': 'BEE-spoke-data/slimpajama_tok-48128-BPE-forT5'}, 'data': {'input_length': 1024, 'mlm_probability': 0.15, 'mean_noise_span_length': 3.0, 'num_workers': 16, 'before_mask_input_length': 1137, 'target_length': 229}, 'optim': {'name': 'adamwscale', 'base_lr': 0.01, 'batch_size': 128, 'total_steps': 20000, 'epochs': -1, 'warmup_steps': 5000, 'lr_scheduler': 'cosine', 'weight_decay': 0.0001, 'grad_clip': 1.0, 'grad_acc': 8, 'final_cosine': 2e-05}, 'eval': {'every_steps': 1000000000, 'steps': 500, 'corrected_steps': 500}, 'checkpoint': {'every_steps': 2500}, 'logging': {'use_wandb': True, 'wandb_config': {'project': 'nanoT5', 'entity': 'pszemraj', 'tags': ['900m', '1024'], 'mode': 'online'}, 'every_steps': 25, 'grad_l2': True, 'weights_l2': True}, 'slurm_id': 'none', 'working_dir': '/workspace/nanoT5/outputs/2024-08-30/19-59-22', 'n_all_param': 887492096}
2024-08-31 23:32:00,793 INFO    MainThread:29052 [wandb_run.py:_finish():2160] finishing run pszemraj/nanoT5/mao0tqjy
2024-08-31 23:32:00,796 INFO    MainThread:29052 [wandb_run.py:_atexit_cleanup():2424] got exitcode: 0
2024-08-31 23:32:00,797 INFO    MainThread:29052 [wandb_run.py:_restore():2406] restore
2024-08-31 23:32:00,798 INFO    MainThread:29052 [wandb_run.py:_restore():2412] restore done
2024-08-31 23:32:00,799 INFO    MainThread:29052 [wandb_run.py:_on_finish():2677] communicating current version
2024-08-31 23:32:00,827 INFO    MainThread:29052 [wandb_run.py:_on_finish():2686] got version response 
2024-08-31 23:32:06,426 INFO    MainThread:29052 [wandb_run.py:_footer_history_summary_info():4078] rendering history
2024-08-31 23:32:06,427 INFO    MainThread:29052 [wandb_run.py:_footer_history_summary_info():4110] rendering summary
2024-08-31 23:32:06,433 INFO    MainThread:29052 [wandb_run.py:_footer_sync_info():4037] logging synced files