File size: 18,521 Bytes
64b847e 31566ec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
2024-04-25 23:50:40,536 INFO StreamThr :213806 [internal.py:wandb_internal():86] W&B internal server running at pid: 213806, started at: 2024-04-25 23:50:40.534762
2024-04-25 23:50:40,537 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: status
2024-04-25 23:50:40,538 INFO WriterThread:213806 [datastore.py:open_for_write():87] open: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/run-eo45cto5.wandb
2024-04-25 23:50:40,541 DEBUG SenderThread:213806 [sender.py:send():379] send: header
2024-04-25 23:50:40,552 DEBUG SenderThread:213806 [sender.py:send():379] send: run
2024-04-25 23:50:40,750 INFO SenderThread:213806 [dir_watcher.py:__init__():211] watching files in: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files
2024-04-25 23:50:40,750 INFO SenderThread:213806 [sender.py:_start_run_threads():1124] run started: eo45cto5 with start time 1714089040.534899
2024-04-25 23:50:40,757 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: check_version
2024-04-25 23:50:40,757 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: check_version
2024-04-25 23:50:40,831 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: run_start
2024-04-25 23:50:40,888 DEBUG HandlerThread:213806 [system_info.py:__init__():26] System info init
2024-04-25 23:50:40,888 DEBUG HandlerThread:213806 [system_info.py:__init__():41] System info init done
2024-04-25 23:50:40,888 INFO HandlerThread:213806 [system_monitor.py:start():194] Starting system monitor
2024-04-25 23:50:40,888 INFO SystemMonitor:213806 [system_monitor.py:_start():158] Starting system asset monitoring threads
2024-04-25 23:50:40,888 INFO HandlerThread:213806 [system_monitor.py:probe():214] Collecting system info
2024-04-25 23:50:40,889 INFO SystemMonitor:213806 [interfaces.py:start():190] Started cpu monitoring
2024-04-25 23:50:40,889 INFO SystemMonitor:213806 [interfaces.py:start():190] Started disk monitoring
2024-04-25 23:50:40,890 INFO SystemMonitor:213806 [interfaces.py:start():190] Started gpu monitoring
2024-04-25 23:50:40,890 INFO SystemMonitor:213806 [interfaces.py:start():190] Started memory monitoring
2024-04-25 23:50:40,891 INFO SystemMonitor:213806 [interfaces.py:start():190] Started network monitoring
2024-04-25 23:50:40,938 DEBUG HandlerThread:213806 [system_info.py:probe():150] Probing system
2024-04-25 23:50:40,940 DEBUG HandlerThread:213806 [system_info.py:_probe_git():135] Probing git
2024-04-25 23:50:40,958 DEBUG HandlerThread:213806 [system_info.py:_probe_git():143] Probing git done
2024-04-25 23:50:40,958 DEBUG HandlerThread:213806 [system_info.py:probe():198] Probing system done
2024-04-25 23:50:40,958 DEBUG HandlerThread:213806 [system_monitor.py:probe():223] {'os': 'Linux-5.15.0-1048-aws-x86_64-with-glibc2.31', 'python': '3.11.9', 'heartbeatAt': '2024-04-25T23:50:40.938569', 'startedAt': '2024-04-25T23:50:40.520650', 'docker': None, 'cuda': None, 'args': ('./config_full.yaml',), 'state': 'running', 'program': '/fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/./run_sft.py', 'codePathLocal': 'run_sft.py', 'codePath': 'run_sft.py', 'git': {'remote': 'https://huggingface.co/sanchit-gandhi/distil-zephyr-1.5b-ssft-ultrachat', 'commit': 'cbea69c6b95c970317a1e47c3f614b55b33f8ed9'}, 'email': None, 'root': '/fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat', 'host': 'ip-26-0-167-177', 'username': 'sanchit', 'executable': '/fsx/sanchit/miniconda3/envs/alignment/bin/python', 'cpu_count': 96, 'cpu_count_logical': 96, 'cpu_freq': {'current': 2728.637187499998, 'min': 0.0, 'max': 0.0}, 'cpu_freq_per_core': [{'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3595.268, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3597.362, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3597.031, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3597.27, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3596.759, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3597.114, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3571.35, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 3597.192, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}, {'current': 2649.998, 'min': 0.0, 'max': 0.0}], 'disk': {'/': {'total': 290.7472343444824, 'used': 58.58916091918945}}, 'gpu': 'NVIDIA H100 80GB HBM3', 'gpu_count': 8, 'gpu_devices': [{'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}, {'name': 'NVIDIA H100 80GB HBM3', 'memory_total': 85520809984}], 'memory': {'total': 1999.9855155944824}}
2024-04-25 23:50:40,958 INFO HandlerThread:213806 [system_monitor.py:probe():224] Finished collecting system info
2024-04-25 23:50:40,958 INFO HandlerThread:213806 [system_monitor.py:probe():227] Publishing system info
2024-04-25 23:50:40,958 DEBUG HandlerThread:213806 [system_info.py:_save_conda():207] Saving list of conda packages installed into the current environment
2024-04-25 23:50:41,752 INFO Thread-12 :213806 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/conda-environment.yaml
2024-04-25 23:50:43,377 DEBUG HandlerThread:213806 [system_info.py:_save_conda():222] Saving conda packages done
2024-04-25 23:50:43,380 INFO HandlerThread:213806 [system_monitor.py:probe():229] Finished publishing system info
2024-04-25 23:50:43,397 DEBUG SenderThread:213806 [sender.py:send():379] send: files
2024-04-25 23:50:43,397 INFO SenderThread:213806 [sender.py:_save_file():1390] saving file wandb-metadata.json with policy now
2024-04-25 23:50:43,537 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: python_packages
2024-04-25 23:50:43,537 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: python_packages
2024-04-25 23:50:43,538 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: stop_status
2024-04-25 23:50:43,538 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: internal_messages
2024-04-25 23:50:43,540 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: stop_status
2024-04-25 23:50:43,627 DEBUG SenderThread:213806 [sender.py:send():379] send: telemetry
2024-04-25 23:50:43,627 DEBUG SenderThread:213806 [sender.py:send():379] send: config
2024-04-25 23:50:43,629 DEBUG SenderThread:213806 [sender.py:send():379] send: metric
2024-04-25 23:50:43,629 DEBUG SenderThread:213806 [sender.py:send():379] send: telemetry
2024-04-25 23:50:43,629 DEBUG SenderThread:213806 [sender.py:send():379] send: metric
2024-04-25 23:50:43,629 WARNING SenderThread:213806 [sender.py:send_metric():1341] Seen metric with glob (shouldn't happen)
2024-04-25 23:50:43,629 DEBUG SenderThread:213806 [sender.py:send():379] send: telemetry
2024-04-25 23:50:43,655 INFO wandb-upload_0:213806 [upload_job.py:push():131] Uploaded file /tmp/tmpm1msy96mwandb/zrzqhhnh-wandb-metadata.json
2024-04-25 23:50:43,754 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/conda-environment.yaml
2024-04-25 23:50:43,754 INFO Thread-12 :213806 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/requirements.txt
2024-04-25 23:50:43,754 INFO Thread-12 :213806 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/wandb-metadata.json
2024-04-25 23:50:43,754 INFO Thread-12 :213806 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:50:45,631 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: status_report
2024-04-25 23:50:45,756 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:50:46,334 DEBUG SenderThread:213806 [sender.py:send():379] send: telemetry
2024-04-25 23:50:46,334 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: summary_record
2024-04-25 23:50:46,335 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: partial_history
2024-04-25 23:50:46,338 INFO SenderThread:213806 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
2024-04-25 23:50:46,338 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: summary_record
2024-04-25 23:50:46,339 INFO SenderThread:213806 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
2024-04-25 23:50:46,339 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: summary_record
2024-04-25 23:50:46,340 INFO SenderThread:213806 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
2024-04-25 23:50:46,340 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: summary_record
2024-04-25 23:50:46,341 INFO SenderThread:213806 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
2024-04-25 23:50:46,342 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: summary_record
2024-04-25 23:50:46,343 INFO SenderThread:213806 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
2024-04-25 23:50:46,343 DEBUG SenderThread:213806 [sender.py:send():379] send: metric
2024-04-25 23:50:46,343 DEBUG SenderThread:213806 [sender.py:send():379] send: history
2024-04-25 23:50:46,343 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: summary_record
2024-04-25 23:50:46,344 INFO SenderThread:213806 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
2024-04-25 23:50:46,758 INFO Thread-12 :213806 [dir_watcher.py:_on_file_created():271] file/dir created: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/wandb-summary.json
2024-04-25 23:50:47,759 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:50:47,862 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: partial_history
2024-04-25 23:50:47,863 DEBUG SenderThread:213806 [sender.py:send():379] send: metric
2024-04-25 23:50:47,864 DEBUG SenderThread:213806 [sender.py:send():379] send: metric
2024-04-25 23:50:47,864 DEBUG SenderThread:213806 [sender.py:send():379] send: metric
2024-04-25 23:50:47,865 DEBUG SenderThread:213806 [sender.py:send():379] send: metric
2024-04-25 23:50:47,866 DEBUG SenderThread:213806 [sender.py:send():379] send: history
2024-04-25 23:50:47,866 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: summary_record
2024-04-25 23:50:47,867 INFO SenderThread:213806 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
2024-04-25 23:50:48,761 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/wandb-summary.json
2024-04-25 23:50:48,761 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:50:49,762 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:50:50,763 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:50:50,871 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: status_report
2024-04-25 23:50:53,767 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:50:56,849 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: status_report
2024-04-25 23:50:58,539 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: stop_status
2024-04-25 23:50:58,539 DEBUG SenderThread:213806 [sender.py:send_request():406] send_request: stop_status
2024-04-25 23:50:58,539 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: internal_messages
2024-04-25 23:50:59,774 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:51:01,776 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:51:02,336 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: status_report
2024-04-25 23:51:03,778 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:51:04,779 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
2024-04-25 23:51:07,341 DEBUG HandlerThread:213806 [handler.py:handle_request():146] handle_request: status_report
2024-04-25 23:51:07,782 INFO Thread-12 :213806 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft-ultrachat/wandb/run-20240425_235040-eo45cto5/files/output.log
|