zzz99's picture
Training in progress, epoch 1
ca0cb2e verified
raw
history blame
33.1 kB
2024-02-08 17:52:38,179 INFO StreamThr :1317 [internal.py:wandb_internal():86] W&B internal server running at pid: 1317, started at: 2024-02-08 17:52:38.179167
2024-02-08 17:52:38,184 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: status
2024-02-08 17:52:38,185 INFO WriterThread:1317 [datastore.py:open_for_write():85] open: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/run-v53k76w9.wandb
2024-02-08 17:52:38,186 DEBUG SenderThread:1317 [sender.py:send():382] send: header
2024-02-08 17:52:38,186 DEBUG SenderThread:1317 [sender.py:send():382] send: run
2024-02-08 17:52:38,455 INFO SenderThread:1317 [dir_watcher.py:__init__():211] watching files in: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files
2024-02-08 17:52:38,455 INFO SenderThread:1317 [sender.py:_start_run_threads():1136] run started: v53k76w9 with start time 1707414758.178795
2024-02-08 17:52:38,459 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: check_version
2024-02-08 17:52:38,459 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: check_version
2024-02-08 17:52:38,542 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: run_start
2024-02-08 17:52:38,571 DEBUG HandlerThread:1317 [system_info.py:__init__():32] System info init
2024-02-08 17:52:38,572 DEBUG HandlerThread:1317 [system_info.py:__init__():47] System info init done
2024-02-08 17:52:38,572 INFO HandlerThread:1317 [system_monitor.py:start():194] Starting system monitor
2024-02-08 17:52:38,572 INFO SystemMonitor:1317 [system_monitor.py:_start():158] Starting system asset monitoring threads
2024-02-08 17:52:38,573 INFO HandlerThread:1317 [system_monitor.py:probe():214] Collecting system info
2024-02-08 17:52:38,574 INFO SystemMonitor:1317 [interfaces.py:start():190] Started cpu monitoring
2024-02-08 17:52:38,574 INFO SystemMonitor:1317 [interfaces.py:start():190] Started disk monitoring
2024-02-08 17:52:38,576 INFO SystemMonitor:1317 [interfaces.py:start():190] Started gpu monitoring
2024-02-08 17:52:38,578 INFO SystemMonitor:1317 [interfaces.py:start():190] Started memory monitoring
2024-02-08 17:52:38,579 INFO SystemMonitor:1317 [interfaces.py:start():190] Started network monitoring
2024-02-08 17:52:38,631 DEBUG HandlerThread:1317 [system_info.py:probe():196] Probing system
2024-02-08 17:52:38,633 DEBUG HandlerThread:1317 [gitlib.py:_init_repo():56] git repository is invalid
2024-02-08 17:52:38,633 DEBUG HandlerThread:1317 [system_info.py:probe():244] Probing system done
2024-02-08 17:52:38,633 DEBUG HandlerThread:1317 [system_monitor.py:probe():223] {'os': 'Linux-4.14.336-253.554.amzn2.x86_64-x86_64-with-glibc2.35', 'python': '3.10.13', 'heartbeatAt': '2024-02-08T17:52:38.631590', 'startedAt': '2024-02-08T17:52:38.174980', 'docker': None, 'cuda': None, 'args': (), 'state': 'running', 'program': '/home/sagemaker-user/output-7b-26k-lora/../lora_finetuning_push_to_hub_save_local.py', 'codePathLocal': None, 'host': 'default', 'username': 'sagemaker-user', 'executable': '/opt/conda/bin/python3', 'cpu_count': 96, 'cpu_count_logical': 192, 'cpu_freq': {'current': 3096.191276041665, 'min': 0.0, 'max': 0.0}, 'cpu_freq_per_core': [{'current': 2806.797, 'min': 0.0, 'max': 0.0}, {'current': 2305.192, 'min': 0.0, 'max': 0.0}, {'current': 2429.072, 'min': 0.0, 'max': 0.0}, {'current': 2448.527, 'min': 0.0, 'max': 0.0}, {'current': 2217.0, 'min': 0.0, 'max': 0.0}, {'current': 2733.2, 'min': 0.0, 'max': 0.0}, {'current': 2599.219, 'min': 0.0, 'max': 0.0}, {'current': 2830.092, 'min': 0.0, 'max': 0.0}, {'current': 2856.656, 'min': 0.0, 'max': 0.0}, {'current': 2766.239, 'min': 0.0, 'max': 0.0}, {'current': 2761.423, 'min': 0.0, 'max': 0.0}, {'current': 2600.369, 'min': 0.0, 'max': 0.0}, {'current': 2658.209, 'min': 0.0, 'max': 0.0}, {'current': 2747.075, 'min': 0.0, 'max': 0.0}, {'current': 3300.035, 'min': 0.0, 'max': 0.0}, {'current': 2742.13, 'min': 0.0, 'max': 0.0}, {'current': 2818.903, 'min': 0.0, 'max': 0.0}, {'current': 2743.213, 'min': 0.0, 'max': 0.0}, {'current': 2432.09, 'min': 0.0, 'max': 0.0}, {'current': 2731.02, 'min': 0.0, 'max': 0.0}, {'current': 2808.377, 'min': 0.0, 'max': 0.0}, {'current': 2777.618, 'min': 0.0, 'max': 0.0}, {'current': 2290.979, 'min': 0.0, 'max': 0.0}, {'current': 2230.543, 'min': 0.0, 'max': 0.0}, {'current': 2738.423, 'min': 0.0, 'max': 0.0}, {'current': 2903.95, 'min': 0.0, 'max': 0.0}, {'current': 2970.61, 'min': 0.0, 'max': 0.0}, {'current': 3299.839, 'min': 0.0, 'max': 0.0}, {'current': 2689.335, 'min': 0.0, 'max': 0.0}, {'current': 2791.925, 'min': 0.0, 'max': 0.0}, {'current': 2731.728, 'min': 0.0, 'max': 0.0}, {'current': 2813.357, 'min': 0.0, 'max': 0.0}, {'current': 2794.296, 'min': 0.0, 'max': 0.0}, {'current': 2747.123, 'min': 0.0, 'max': 0.0}, {'current': 2795.435, 'min': 0.0, 'max': 0.0}, {'current': 2767.017, 'min': 0.0, 'max': 0.0}, {'current': 2722.071, 'min': 0.0, 'max': 0.0}, {'current': 3298.527, 'min': 0.0, 'max': 0.0}, {'current': 2932.725, 'min': 0.0, 'max': 0.0}, {'current': 3292.093, 'min': 0.0, 'max': 0.0}, {'current': 3265.824, 'min': 0.0, 'max': 0.0}, {'current': 3256.045, 'min': 0.0, 'max': 0.0}, {'current': 3256.429, 'min': 0.0, 'max': 0.0}, {'current': 3259.575, 'min': 0.0, 'max': 0.0}, {'current': 2700.636, 'min': 0.0, 'max': 0.0}, {'current': 3234.186, 'min': 0.0, 'max': 0.0}, {'current': 3206.966, 'min': 0.0, 'max': 0.0}, {'current': 3299.085, 'min': 0.0, 'max': 0.0}, {'current': 3282.893, 'min': 0.0, 'max': 0.0}, {'current': 3279.04, 'min': 0.0, 'max': 0.0}, {'current': 3278.154, 'min': 0.0, 'max': 0.0}, {'current': 3283.989, 'min': 0.0, 'max': 0.0}, {'current': 2562.18, 'min': 0.0, 'max': 0.0}, {'current': 2954.006, 'min': 0.0, 'max': 0.0}, {'current': 2762.278, 'min': 0.0, 'max': 0.0}, {'current': 3275.22, 'min': 0.0, 'max': 0.0}, {'current': 3300.85, 'min': 0.0, 'max': 0.0}, {'current': 3291.939, 'min': 0.0, 'max': 0.0}, {'current': 2973.521, 'min': 0.0, 'max': 0.0}, {'current': 2966.002, 'min': 0.0, 'max': 0.0}, {'current': 2966.843, 'min': 0.0, 'max': 0.0}, {'current': 2645.143, 'min': 0.0, 'max': 0.0}, {'current': 3046.118, 'min': 0.0, 'max': 0.0}, {'current': 3006.852, 'min': 0.0, 'max': 0.0}, {'current': 3296.715, 'min': 0.0, 'max': 0.0}, {'current': 2922.754, 'min': 0.0, 'max': 0.0}, {'current': 2906.522, 'min': 0.0, 'max': 0.0}, {'current': 3028.907, 'min': 0.0, 'max': 0.0}, {'current': 2966.081, 'min': 0.0, 'max': 0.0}, {'current': 2917.105, 'min': 0.0, 'max': 0.0}, {'current': 3299.43, 'min': 0.0, 'max': 0.0}, {'current': 3300.481, 'min': 0.0, 'max': 0.0}, {'current': 3270.344, 'min': 0.0, 'max': 0.0}, {'current': 2930.864, 'min': 0.0, 'max': 0.0}, {'current': 2879.041, 'min': 0.0, 'max': 0.0}, {'current': 2902.742, 'min': 0.0, 'max': 0.0}, {'current': 3300.401, 'min': 0.0, 'max': 0.0}, {'current': 2686.543, 'min': 0.0, 'max': 0.0}, {'current': 3222.046, 'min': 0.0, 'max': 0.0}, {'current': 3298.97, 'min': 0.0, 'max': 0.0}, {'current': 3298.666, 'min': 0.0, 'max': 0.0}, {'current': 2754.074, 'min': 0.0, 'max': 0.0}, {'current': 3299.533, 'min': 0.0, 'max': 0.0}, {'current': 2812.149, 'min': 0.0, 'max': 0.0}, {'current': 3300.31, 'min': 0.0, 'max': 0.0}, {'current': 3300.208, 'min': 0.0, 'max': 0.0}, {'current': 2779.101, 'min': 0.0, 'max': 0.0}, {'current': 3300.477, 'min': 0.0, 'max': 0.0}, {'current': 2825.936, 'min': 0.0, 'max': 0.0}, {'current': 2204.979, 'min': 0.0, 'max': 0.0}, {'current': 2851.77, 'min': 0.0, 'max': 0.0}, {'current': 2797.024, 'min': 0.0, 'max': 0.0}, {'current': 2325.643, 'min': 0.0, 'max': 0.0}, {'current': 2850.865, 'min': 0.0, 'max': 0.0}, {'current': 2919.634, 'min': 0.0, 'max': 0.0}, {'current': 2910.972, 'min': 0.0, 'max': 0.0}, {'current': 2523.164, 'min': 0.0, 'max': 0.0}, {'current': 2297.34, 'min': 0.0, 'max': 0.0}, {'current': 2193.979, 'min': 0.0, 'max': 0.0}, {'current': 2128.798, 'min': 0.0, 'max': 0.0}, {'current': 1907.218, 'min': 0.0, 'max': 0.0}, {'current': 2921.246, 'min': 0.0, 'max': 0.0}, {'current': 2408.454, 'min': 0.0, 'max': 0.0}, {'current': 2296.906, 'min': 0.0, 'max': 0.0}, {'current': 2877.315, 'min': 0.0, 'max': 0.0}, {'current': 2985.576, 'min': 0.0, 'max': 0.0}, {'current': 2977.194, 'min': 0.0, 'max': 0.0}, {'current': 2982.705, 'min': 0.0, 'max': 0.0}, {'current': 2367.542, 'min': 0.0, 'max': 0.0}, {'current': 2232.475, 'min': 0.0, 'max': 0.0}, {'current': 2720.158, 'min': 0.0, 'max': 0.0}, {'current': 2260.753, 'min': 0.0, 'max': 0.0}, {'current': 2215.697, 'min': 0.0, 'max': 0.0}, {'current': 2278.892, 'min': 0.0, 'max': 0.0}, {'current': 2009.932, 'min': 0.0, 'max': 0.0}, {'current': 2813.45, 'min': 0.0, 'max': 0.0}, {'current': 2248.538, 'min': 0.0, 'max': 0.0}, {'current': 2789.291, 'min': 0.0, 'max': 0.0}, {'current': 2481.076, 'min': 0.0, 'max': 0.0}, {'current': 2033.475, 'min': 0.0, 'max': 0.0}, {'current': 2214.296, 'min': 0.0, 'max': 0.0}, {'current': 2762.868, 'min': 0.0, 'max': 0.0}, {'current': 2273.931, 'min': 0.0, 'max': 0.0}, {'current': 2891.192, 'min': 0.0, 'max': 0.0}, {'current': 2217.993, 'min': 0.0, 'max': 0.0}, {'current': 2306.666, 'min': 0.0, 'max': 0.0}, {'current': 2372.976, 'min': 0.0, 'max': 0.0}, {'current': 2322.672, 'min': 0.0, 'max': 0.0}, {'current': 2325.945, 'min': 0.0, 'max': 0.0}, {'current': 2332.493, 'min': 0.0, 'max': 0.0}, {'current': 2202.398, 'min': 0.0, 'max': 0.0}, {'current': 2130.875, 'min': 0.0, 'max': 0.0}, {'current': 2034.318, 'min': 0.0, 'max': 0.0}, {'current': 2539.829, 'min': 0.0, 'max': 0.0}, {'current': 2088.35, 'min': 0.0, 'max': 0.0}, {'current': 2427.524, 'min': 0.0, 'max': 0.0}, {'current': 2432.02, 'min': 0.0, 'max': 0.0}, {'current': 2521.716, 'min': 0.0, 'max': 0.0}, {'current': 3047.178, 'min': 0.0, 'max': 0.0}, {'current': 2452.92, 'min': 0.0, 'max': 0.0}, {'current': 2398.052, 'min': 0.0, 'max': 0.0}, {'current': 2930.232, 'min': 0.0, 'max': 0.0}, {'current': 2915.194, 'min': 0.0, 'max': 0.0}, {'current': 3050.935, 'min': 0.0, 'max': 0.0}, {'current': 2985.592, 'min': 0.0, 'max': 0.0}, {'current': 2999.519, 'min': 0.0, 'max': 0.0}, {'current': 2954.304, 'min': 0.0, 'max': 0.0}, {'current': 3253.761, 'min': 0.0, 'max': 0.0}, {'current': 2547.987, 'min': 0.0, 'max': 0.0}, {'current': 2791.034, 'min': 0.0, 'max': 0.0}, {'current': 2669.218, 'min': 0.0, 'max': 0.0}, {'current': 3304.846, 'min': 0.0, 'max': 0.0}, {'current': 3017.308, 'min': 0.0, 'max': 0.0}, {'current': 3299.861, 'min': 0.0, 'max': 0.0}, {'current': 2977.232, 'min': 0.0, 'max': 0.0}, {'current': 2939.823, 'min': 0.0, 'max': 0.0}, {'current': 3300.543, 'min': 0.0, 'max': 0.0}, {'current': 3014.24, 'min': 0.0, 'max': 0.0}, {'current': 3299.908, 'min': 0.0, 'max': 0.0}, {'current': 3014.885, 'min': 0.0, 'max': 0.0}, {'current': 3297.521, 'min': 0.0, 'max': 0.0}, {'current': 3296.848, 'min': 0.0, 'max': 0.0}, {'current': 3297.858, 'min': 0.0, 'max': 0.0}, {'current': 3296.813, 'min': 0.0, 'max': 0.0}, {'current': 2998.973, 'min': 0.0, 'max': 0.0}, {'current': 3299.759, 'min': 0.0, 'max': 0.0}, {'current': 3026.427, 'min': 0.0, 'max': 0.0}, {'current': 3300.35, 'min': 0.0, 'max': 0.0}, {'current': 2507.162, 'min': 0.0, 'max': 0.0}, {'current': 3250.875, 'min': 0.0, 'max': 0.0}, {'current': 3299.582, 'min': 0.0, 'max': 0.0}, {'current': 3299.791, 'min': 0.0, 'max': 0.0}, {'current': 2876.895, 'min': 0.0, 'max': 0.0}, {'current': 3300.637, 'min': 0.0, 'max': 0.0}, {'current': 3299.935, 'min': 0.0, 'max': 0.0}, {'current': 3299.409, 'min': 0.0, 'max': 0.0}, {'current': 3299.545, 'min': 0.0, 'max': 0.0}, {'current': 2845.582, 'min': 0.0, 'max': 0.0}, {'current': 3298.789, 'min': 0.0, 'max': 0.0}, {'current': 3212.048, 'min': 0.0, 'max': 0.0}, {'current': 2598.735, 'min': 0.0, 'max': 0.0}, {'current': 3299.632, 'min': 0.0, 'max': 0.0}, {'current': 3299.179, 'min': 0.0, 'max': 0.0}, {'current': 3298.805, 'min': 0.0, 'max': 0.0}, {'current': 3296.982, 'min': 0.0, 'max': 0.0}, {'current': 2498.549, 'min': 0.0, 'max': 0.0}, {'current': 3296.222, 'min': 0.0, 'max': 0.0}, {'current': 3297.448, 'min': 0.0, 'max': 0.0}, {'current': 2830.786, 'min': 0.0, 'max': 0.0}, {'current': 3299.116, 'min': 0.0, 'max': 0.0}, {'current': 3299.39, 'min': 0.0, 'max': 0.0}, {'current': 3299.373, 'min': 0.0, 'max': 0.0}], 'disk': {'/': {'total': 32.0, 'used': 0.012481689453125}}, 'gpu': 'NVIDIA A10G', 'gpu_count': 8, 'gpu_devices': [{'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}], 'memory': {'total': 747.9597625732422}}
2024-02-08 17:52:38,634 INFO HandlerThread:1317 [system_monitor.py:probe():224] Finished collecting system info
2024-02-08 17:52:38,634 INFO HandlerThread:1317 [system_monitor.py:probe():227] Publishing system info
2024-02-08 17:52:38,634 DEBUG HandlerThread:1317 [system_info.py:_save_pip():52] Saving list of pip packages installed into the current environment
2024-02-08 17:52:38,634 DEBUG HandlerThread:1317 [system_info.py:_save_pip():68] Saving pip packages done
2024-02-08 17:52:38,634 DEBUG HandlerThread:1317 [system_info.py:_save_conda():75] Saving list of conda packages installed into the current environment
2024-02-08 17:52:39,456 INFO Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/conda-environment.yaml
2024-02-08 17:52:39,457 INFO Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/requirements.txt
2024-02-08 17:52:52,948 DEBUG HandlerThread:1317 [system_info.py:_save_conda():87] Saving conda packages done
2024-02-08 17:52:52,950 INFO HandlerThread:1317 [system_monitor.py:probe():229] Finished publishing system info
2024-02-08 17:52:52,954 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 17:52:52,954 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 17:52:52,954 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 17:52:52,954 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 17:52:52,955 DEBUG SenderThread:1317 [sender.py:send():382] send: files
2024-02-08 17:52:52,955 INFO SenderThread:1317 [sender.py:_save_file():1392] saving file wandb-metadata.json with policy now
2024-02-08 17:52:52,961 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: stop_status
2024-02-08 17:52:52,962 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: stop_status
2024-02-08 17:52:52,964 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 17:52:53,118 DEBUG SenderThread:1317 [sender.py:send():382] send: telemetry
2024-02-08 17:52:53,118 DEBUG SenderThread:1317 [sender.py:send():382] send: config
2024-02-08 17:52:53,118 DEBUG SenderThread:1317 [sender.py:send():382] send: metric
2024-02-08 17:52:53,118 DEBUG SenderThread:1317 [sender.py:send():382] send: telemetry
2024-02-08 17:52:53,119 DEBUG SenderThread:1317 [sender.py:send():382] send: metric
2024-02-08 17:52:53,119 WARNING SenderThread:1317 [sender.py:send_metric():1343] Seen metric with glob (shouldn't happen)
2024-02-08 17:52:53,356 INFO wandb-upload_0:1317 [upload_job.py:push():131] Uploaded file /tmp/tmpftpllcuxwandb/1bgc597r-wandb-metadata.json
2024-02-08 17:52:53,459 INFO Thread-12 :1317 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/conda-environment.yaml
2024-02-08 17:52:53,459 INFO Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-metadata.json
2024-02-08 17:52:53,459 INFO Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log
2024-02-08 17:52:53,833 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 17:52:55,459 INFO Thread-12 :1317 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log
2024-02-08 17:52:55,914 DEBUG SenderThread:1317 [sender.py:send():382] send: exit
2024-02-08 17:52:55,914 INFO SenderThread:1317 [sender.py:send_exit():589] handling exit code: 1
2024-02-08 17:52:55,914 INFO SenderThread:1317 [sender.py:send_exit():591] handling runtime: 17
2024-02-08 17:52:55,915 INFO SenderThread:1317 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 17:52:55,915 INFO SenderThread:1317 [sender.py:send_exit():597] send defer
2024-02-08 17:52:55,915 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,915 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 0
2024-02-08 17:52:55,916 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,916 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 0
2024-02-08 17:52:55,916 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 1
2024-02-08 17:52:55,916 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,916 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 1
2024-02-08 17:52:55,916 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,916 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 1
2024-02-08 17:52:55,916 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 2
2024-02-08 17:52:55,916 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,916 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 2
2024-02-08 17:52:55,916 INFO HandlerThread:1317 [system_monitor.py:finish():203] Stopping system monitor
2024-02-08 17:52:55,917 INFO HandlerThread:1317 [interfaces.py:finish():202] Joined cpu monitor
2024-02-08 17:52:55,917 INFO HandlerThread:1317 [interfaces.py:finish():202] Joined disk monitor
2024-02-08 17:52:55,918 DEBUG SystemMonitor:1317 [system_monitor.py:_start():172] Starting system metrics aggregation loop
2024-02-08 17:52:55,918 DEBUG SystemMonitor:1317 [system_monitor.py:_start():179] Finished system metrics aggregation loop
2024-02-08 17:52:55,918 DEBUG SystemMonitor:1317 [system_monitor.py:_start():183] Publishing last batch of metrics
2024-02-08 17:52:55,956 INFO HandlerThread:1317 [interfaces.py:finish():202] Joined gpu monitor
2024-02-08 17:52:55,956 INFO HandlerThread:1317 [interfaces.py:finish():202] Joined memory monitor
2024-02-08 17:52:55,956 INFO HandlerThread:1317 [interfaces.py:finish():202] Joined network monitor
2024-02-08 17:52:55,957 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,957 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 2
2024-02-08 17:52:55,957 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 3
2024-02-08 17:52:55,957 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,958 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 3
2024-02-08 17:52:55,958 DEBUG SenderThread:1317 [sender.py:send():382] send: stats
2024-02-08 17:52:55,959 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,959 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 3
2024-02-08 17:52:55,959 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 4
2024-02-08 17:52:55,959 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,959 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 4
2024-02-08 17:52:55,959 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,959 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 4
2024-02-08 17:52:55,959 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 5
2024-02-08 17:52:55,959 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,959 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 5
2024-02-08 17:52:55,960 DEBUG SenderThread:1317 [sender.py:send():382] send: summary
2024-02-08 17:52:55,961 INFO SenderThread:1317 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 17:52:55,961 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,961 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 5
2024-02-08 17:52:55,961 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 6
2024-02-08 17:52:55,961 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,961 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 6
2024-02-08 17:52:55,961 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,961 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 6
2024-02-08 17:52:55,966 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 17:52:56,102 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 7
2024-02-08 17:52:56,102 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:56,102 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 7
2024-02-08 17:52:56,103 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:56,103 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 7
2024-02-08 17:52:56,459 INFO Thread-12 :1317 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/config.yaml
2024-02-08 17:52:56,459 INFO Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-summary.json
2024-02-08 17:52:56,914 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 17:52:57,129 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 8
2024-02-08 17:52:57,129 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 17:52:57,130 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:57,130 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 8
2024-02-08 17:52:57,130 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:57,130 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 8
2024-02-08 17:52:57,130 INFO SenderThread:1317 [job_builder.py:build():298] Attempting to build job artifact
2024-02-08 17:52:57,131 INFO SenderThread:1317 [job_builder.py:_get_source_type():439] no source found
2024-02-08 17:52:57,131 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 9
2024-02-08 17:52:57,131 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:57,131 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 9
2024-02-08 17:52:57,132 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:57,132 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 9
2024-02-08 17:52:57,132 INFO SenderThread:1317 [dir_watcher.py:finish():358] shutting down directory watcher
2024-02-08 17:52:57,460 INFO Thread-12 :1317 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log
2024-02-08 17:52:57,460 INFO SenderThread:1317 [dir_watcher.py:finish():388] scan: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files
2024-02-08 17:52:57,460 INFO SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/config.yaml config.yaml
2024-02-08 17:52:57,460 INFO SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/requirements.txt requirements.txt
2024-02-08 17:52:57,460 INFO SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/conda-environment.yaml conda-environment.yaml
2024-02-08 17:52:57,461 INFO SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-metadata.json wandb-metadata.json
2024-02-08 17:52:57,461 INFO SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log output.log
2024-02-08 17:52:57,463 INFO SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-summary.json wandb-summary.json
2024-02-08 17:52:57,464 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 10
2024-02-08 17:52:57,467 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:57,467 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 10
2024-02-08 17:52:57,468 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:57,468 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 10
2024-02-08 17:52:57,468 INFO SenderThread:1317 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 17:52:57,674 INFO wandb-upload_0:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/requirements.txt
2024-02-08 17:52:57,753 INFO wandb-upload_1:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/config.yaml
2024-02-08 17:52:57,791 INFO wandb-upload_3:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log
2024-02-08 17:52:57,800 INFO wandb-upload_2:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/conda-environment.yaml
2024-02-08 17:52:57,804 INFO wandb-upload_4:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-summary.json
2024-02-08 17:52:57,915 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 17:52:57,915 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 17:52:58,004 INFO Thread-11 (_thread_body):1317 [sender.py:transition_state():617] send defer: 11
2024-02-08 17:52:58,004 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:58,004 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 11
2024-02-08 17:52:58,005 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:58,005 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 11
2024-02-08 17:52:58,005 INFO SenderThread:1317 [file_pusher.py:join():181] waiting for file pusher
2024-02-08 17:52:58,005 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 12
2024-02-08 17:52:58,005 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:58,005 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 12
2024-02-08 17:52:58,006 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:58,006 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 12
2024-02-08 17:52:58,006 INFO SenderThread:1317 [file_stream.py:finish():595] file stream finish called
2024-02-08 17:52:58,071 INFO SenderThread:1317 [file_stream.py:finish():599] file stream finish is done
2024-02-08 17:52:58,071 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 13
2024-02-08 17:52:58,071 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:58,071 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 13
2024-02-08 17:52:58,071 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:58,071 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 13
2024-02-08 17:52:58,071 INFO SenderThread:1317 [sender.py:transition_state():617] send defer: 14
2024-02-08 17:52:58,071 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:58,071 INFO HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 14
2024-02-08 17:52:58,072 DEBUG SenderThread:1317 [sender.py:send():382] send: final
2024-02-08 17:52:58,072 DEBUG SenderThread:1317 [sender.py:send():382] send: footer
2024-02-08 17:52:58,072 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:58,072 INFO SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 14
2024-02-08 17:52:58,072 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 17:52:58,072 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 17:52:58,073 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 17:52:58,073 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 17:52:58,073 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: server_info
2024-02-08 17:52:58,073 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: server_info
2024-02-08 17:52:58,075 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: get_summary
2024-02-08 17:52:58,075 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: sampled_history
2024-02-08 17:52:58,076 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 17:52:58,076 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: job_info
2024-02-08 17:52:58,121 DEBUG SenderThread:1317 [sender.py:send_request():409] send_request: job_info
2024-02-08 17:52:58,122 INFO MainThread:1317 [wandb_run.py:_footer_history_summary_info():3837] rendering history
2024-02-08 17:52:58,122 INFO MainThread:1317 [wandb_run.py:_footer_history_summary_info():3869] rendering summary
2024-02-08 17:52:58,122 INFO MainThread:1317 [wandb_run.py:_footer_sync_info():3796] logging synced files
2024-02-08 17:52:58,122 DEBUG HandlerThread:1317 [handler.py:handle_request():146] handle_request: shutdown
2024-02-08 17:52:58,122 INFO HandlerThread:1317 [handler.py:finish():866] shutting down handler
2024-02-08 17:52:59,076 INFO WriterThread:1317 [datastore.py:close():294] close: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/run-v53k76w9.wandb
2024-02-08 17:52:59,122 INFO SenderThread:1317 [sender.py:finish():1548] shutting down sender
2024-02-08 17:52:59,122 INFO SenderThread:1317 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 17:52:59,122 INFO SenderThread:1317 [file_pusher.py:join():181] waiting for file pusher