File size: 33,054 Bytes
ca0cb2e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
2024-02-08 17:52:38,179 INFO    StreamThr :1317 [internal.py:wandb_internal():86] W&B internal server running at pid: 1317, started at: 2024-02-08 17:52:38.179167
2024-02-08 17:52:38,184 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: status
2024-02-08 17:52:38,185 INFO    WriterThread:1317 [datastore.py:open_for_write():85] open: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/run-v53k76w9.wandb
2024-02-08 17:52:38,186 DEBUG   SenderThread:1317 [sender.py:send():382] send: header
2024-02-08 17:52:38,186 DEBUG   SenderThread:1317 [sender.py:send():382] send: run
2024-02-08 17:52:38,455 INFO    SenderThread:1317 [dir_watcher.py:__init__():211] watching files in: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files
2024-02-08 17:52:38,455 INFO    SenderThread:1317 [sender.py:_start_run_threads():1136] run started: v53k76w9 with start time 1707414758.178795
2024-02-08 17:52:38,459 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: check_version
2024-02-08 17:52:38,459 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: check_version
2024-02-08 17:52:38,542 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: run_start
2024-02-08 17:52:38,571 DEBUG   HandlerThread:1317 [system_info.py:__init__():32] System info init
2024-02-08 17:52:38,572 DEBUG   HandlerThread:1317 [system_info.py:__init__():47] System info init done
2024-02-08 17:52:38,572 INFO    HandlerThread:1317 [system_monitor.py:start():194] Starting system monitor
2024-02-08 17:52:38,572 INFO    SystemMonitor:1317 [system_monitor.py:_start():158] Starting system asset monitoring threads
2024-02-08 17:52:38,573 INFO    HandlerThread:1317 [system_monitor.py:probe():214] Collecting system info
2024-02-08 17:52:38,574 INFO    SystemMonitor:1317 [interfaces.py:start():190] Started cpu monitoring
2024-02-08 17:52:38,574 INFO    SystemMonitor:1317 [interfaces.py:start():190] Started disk monitoring
2024-02-08 17:52:38,576 INFO    SystemMonitor:1317 [interfaces.py:start():190] Started gpu monitoring
2024-02-08 17:52:38,578 INFO    SystemMonitor:1317 [interfaces.py:start():190] Started memory monitoring
2024-02-08 17:52:38,579 INFO    SystemMonitor:1317 [interfaces.py:start():190] Started network monitoring
2024-02-08 17:52:38,631 DEBUG   HandlerThread:1317 [system_info.py:probe():196] Probing system
2024-02-08 17:52:38,633 DEBUG   HandlerThread:1317 [gitlib.py:_init_repo():56] git repository is invalid
2024-02-08 17:52:38,633 DEBUG   HandlerThread:1317 [system_info.py:probe():244] Probing system done
2024-02-08 17:52:38,633 DEBUG   HandlerThread:1317 [system_monitor.py:probe():223] {'os': 'Linux-4.14.336-253.554.amzn2.x86_64-x86_64-with-glibc2.35', 'python': '3.10.13', 'heartbeatAt': '2024-02-08T17:52:38.631590', 'startedAt': '2024-02-08T17:52:38.174980', 'docker': None, 'cuda': None, 'args': (), 'state': 'running', 'program': '/home/sagemaker-user/output-7b-26k-lora/../lora_finetuning_push_to_hub_save_local.py', 'codePathLocal': None, 'host': 'default', 'username': 'sagemaker-user', 'executable': '/opt/conda/bin/python3', 'cpu_count': 96, 'cpu_count_logical': 192, 'cpu_freq': {'current': 3096.191276041665, 'min': 0.0, 'max': 0.0}, 'cpu_freq_per_core': [{'current': 2806.797, 'min': 0.0, 'max': 0.0}, {'current': 2305.192, 'min': 0.0, 'max': 0.0}, {'current': 2429.072, 'min': 0.0, 'max': 0.0}, {'current': 2448.527, 'min': 0.0, 'max': 0.0}, {'current': 2217.0, 'min': 0.0, 'max': 0.0}, {'current': 2733.2, 'min': 0.0, 'max': 0.0}, {'current': 2599.219, 'min': 0.0, 'max': 0.0}, {'current': 2830.092, 'min': 0.0, 'max': 0.0}, {'current': 2856.656, 'min': 0.0, 'max': 0.0}, {'current': 2766.239, 'min': 0.0, 'max': 0.0}, {'current': 2761.423, 'min': 0.0, 'max': 0.0}, {'current': 2600.369, 'min': 0.0, 'max': 0.0}, {'current': 2658.209, 'min': 0.0, 'max': 0.0}, {'current': 2747.075, 'min': 0.0, 'max': 0.0}, {'current': 3300.035, 'min': 0.0, 'max': 0.0}, {'current': 2742.13, 'min': 0.0, 'max': 0.0}, {'current': 2818.903, 'min': 0.0, 'max': 0.0}, {'current': 2743.213, 'min': 0.0, 'max': 0.0}, {'current': 2432.09, 'min': 0.0, 'max': 0.0}, {'current': 2731.02, 'min': 0.0, 'max': 0.0}, {'current': 2808.377, 'min': 0.0, 'max': 0.0}, {'current': 2777.618, 'min': 0.0, 'max': 0.0}, {'current': 2290.979, 'min': 0.0, 'max': 0.0}, {'current': 2230.543, 'min': 0.0, 'max': 0.0}, {'current': 2738.423, 'min': 0.0, 'max': 0.0}, {'current': 2903.95, 'min': 0.0, 'max': 0.0}, {'current': 2970.61, 'min': 0.0, 'max': 0.0}, {'current': 3299.839, 'min': 0.0, 'max': 0.0}, {'current': 2689.335, 'min': 0.0, 'max': 0.0}, {'current': 2791.925, 'min': 0.0, 'max': 0.0}, {'current': 2731.728, 'min': 0.0, 'max': 0.0}, {'current': 2813.357, 'min': 0.0, 'max': 0.0}, {'current': 2794.296, 'min': 0.0, 'max': 0.0}, {'current': 2747.123, 'min': 0.0, 'max': 0.0}, {'current': 2795.435, 'min': 0.0, 'max': 0.0}, {'current': 2767.017, 'min': 0.0, 'max': 0.0}, {'current': 2722.071, 'min': 0.0, 'max': 0.0}, {'current': 3298.527, 'min': 0.0, 'max': 0.0}, {'current': 2932.725, 'min': 0.0, 'max': 0.0}, {'current': 3292.093, 'min': 0.0, 'max': 0.0}, {'current': 3265.824, 'min': 0.0, 'max': 0.0}, {'current': 3256.045, 'min': 0.0, 'max': 0.0}, {'current': 3256.429, 'min': 0.0, 'max': 0.0}, {'current': 3259.575, 'min': 0.0, 'max': 0.0}, {'current': 2700.636, 'min': 0.0, 'max': 0.0}, {'current': 3234.186, 'min': 0.0, 'max': 0.0}, {'current': 3206.966, 'min': 0.0, 'max': 0.0}, {'current': 3299.085, 'min': 0.0, 'max': 0.0}, {'current': 3282.893, 'min': 0.0, 'max': 0.0}, {'current': 3279.04, 'min': 0.0, 'max': 0.0}, {'current': 3278.154, 'min': 0.0, 'max': 0.0}, {'current': 3283.989, 'min': 0.0, 'max': 0.0}, {'current': 2562.18, 'min': 0.0, 'max': 0.0}, {'current': 2954.006, 'min': 0.0, 'max': 0.0}, {'current': 2762.278, 'min': 0.0, 'max': 0.0}, {'current': 3275.22, 'min': 0.0, 'max': 0.0}, {'current': 3300.85, 'min': 0.0, 'max': 0.0}, {'current': 3291.939, 'min': 0.0, 'max': 0.0}, {'current': 2973.521, 'min': 0.0, 'max': 0.0}, {'current': 2966.002, 'min': 0.0, 'max': 0.0}, {'current': 2966.843, 'min': 0.0, 'max': 0.0}, {'current': 2645.143, 'min': 0.0, 'max': 0.0}, {'current': 3046.118, 'min': 0.0, 'max': 0.0}, {'current': 3006.852, 'min': 0.0, 'max': 0.0}, {'current': 3296.715, 'min': 0.0, 'max': 0.0}, {'current': 2922.754, 'min': 0.0, 'max': 0.0}, {'current': 2906.522, 'min': 0.0, 'max': 0.0}, {'current': 3028.907, 'min': 0.0, 'max': 0.0}, {'current': 2966.081, 'min': 0.0, 'max': 0.0}, {'current': 2917.105, 'min': 0.0, 'max': 0.0}, {'current': 3299.43, 'min': 0.0, 'max': 0.0}, {'current': 3300.481, 'min': 0.0, 'max': 0.0}, {'current': 3270.344, 'min': 0.0, 'max': 0.0}, {'current': 2930.864, 'min': 0.0, 'max': 0.0}, {'current': 2879.041, 'min': 0.0, 'max': 0.0}, {'current': 2902.742, 'min': 0.0, 'max': 0.0}, {'current': 3300.401, 'min': 0.0, 'max': 0.0}, {'current': 2686.543, 'min': 0.0, 'max': 0.0}, {'current': 3222.046, 'min': 0.0, 'max': 0.0}, {'current': 3298.97, 'min': 0.0, 'max': 0.0}, {'current': 3298.666, 'min': 0.0, 'max': 0.0}, {'current': 2754.074, 'min': 0.0, 'max': 0.0}, {'current': 3299.533, 'min': 0.0, 'max': 0.0}, {'current': 2812.149, 'min': 0.0, 'max': 0.0}, {'current': 3300.31, 'min': 0.0, 'max': 0.0}, {'current': 3300.208, 'min': 0.0, 'max': 0.0}, {'current': 2779.101, 'min': 0.0, 'max': 0.0}, {'current': 3300.477, 'min': 0.0, 'max': 0.0}, {'current': 2825.936, 'min': 0.0, 'max': 0.0}, {'current': 2204.979, 'min': 0.0, 'max': 0.0}, {'current': 2851.77, 'min': 0.0, 'max': 0.0}, {'current': 2797.024, 'min': 0.0, 'max': 0.0}, {'current': 2325.643, 'min': 0.0, 'max': 0.0}, {'current': 2850.865, 'min': 0.0, 'max': 0.0}, {'current': 2919.634, 'min': 0.0, 'max': 0.0}, {'current': 2910.972, 'min': 0.0, 'max': 0.0}, {'current': 2523.164, 'min': 0.0, 'max': 0.0}, {'current': 2297.34, 'min': 0.0, 'max': 0.0}, {'current': 2193.979, 'min': 0.0, 'max': 0.0}, {'current': 2128.798, 'min': 0.0, 'max': 0.0}, {'current': 1907.218, 'min': 0.0, 'max': 0.0}, {'current': 2921.246, 'min': 0.0, 'max': 0.0}, {'current': 2408.454, 'min': 0.0, 'max': 0.0}, {'current': 2296.906, 'min': 0.0, 'max': 0.0}, {'current': 2877.315, 'min': 0.0, 'max': 0.0}, {'current': 2985.576, 'min': 0.0, 'max': 0.0}, {'current': 2977.194, 'min': 0.0, 'max': 0.0}, {'current': 2982.705, 'min': 0.0, 'max': 0.0}, {'current': 2367.542, 'min': 0.0, 'max': 0.0}, {'current': 2232.475, 'min': 0.0, 'max': 0.0}, {'current': 2720.158, 'min': 0.0, 'max': 0.0}, {'current': 2260.753, 'min': 0.0, 'max': 0.0}, {'current': 2215.697, 'min': 0.0, 'max': 0.0}, {'current': 2278.892, 'min': 0.0, 'max': 0.0}, {'current': 2009.932, 'min': 0.0, 'max': 0.0}, {'current': 2813.45, 'min': 0.0, 'max': 0.0}, {'current': 2248.538, 'min': 0.0, 'max': 0.0}, {'current': 2789.291, 'min': 0.0, 'max': 0.0}, {'current': 2481.076, 'min': 0.0, 'max': 0.0}, {'current': 2033.475, 'min': 0.0, 'max': 0.0}, {'current': 2214.296, 'min': 0.0, 'max': 0.0}, {'current': 2762.868, 'min': 0.0, 'max': 0.0}, {'current': 2273.931, 'min': 0.0, 'max': 0.0}, {'current': 2891.192, 'min': 0.0, 'max': 0.0}, {'current': 2217.993, 'min': 0.0, 'max': 0.0}, {'current': 2306.666, 'min': 0.0, 'max': 0.0}, {'current': 2372.976, 'min': 0.0, 'max': 0.0}, {'current': 2322.672, 'min': 0.0, 'max': 0.0}, {'current': 2325.945, 'min': 0.0, 'max': 0.0}, {'current': 2332.493, 'min': 0.0, 'max': 0.0}, {'current': 2202.398, 'min': 0.0, 'max': 0.0}, {'current': 2130.875, 'min': 0.0, 'max': 0.0}, {'current': 2034.318, 'min': 0.0, 'max': 0.0}, {'current': 2539.829, 'min': 0.0, 'max': 0.0}, {'current': 2088.35, 'min': 0.0, 'max': 0.0}, {'current': 2427.524, 'min': 0.0, 'max': 0.0}, {'current': 2432.02, 'min': 0.0, 'max': 0.0}, {'current': 2521.716, 'min': 0.0, 'max': 0.0}, {'current': 3047.178, 'min': 0.0, 'max': 0.0}, {'current': 2452.92, 'min': 0.0, 'max': 0.0}, {'current': 2398.052, 'min': 0.0, 'max': 0.0}, {'current': 2930.232, 'min': 0.0, 'max': 0.0}, {'current': 2915.194, 'min': 0.0, 'max': 0.0}, {'current': 3050.935, 'min': 0.0, 'max': 0.0}, {'current': 2985.592, 'min': 0.0, 'max': 0.0}, {'current': 2999.519, 'min': 0.0, 'max': 0.0}, {'current': 2954.304, 'min': 0.0, 'max': 0.0}, {'current': 3253.761, 'min': 0.0, 'max': 0.0}, {'current': 2547.987, 'min': 0.0, 'max': 0.0}, {'current': 2791.034, 'min': 0.0, 'max': 0.0}, {'current': 2669.218, 'min': 0.0, 'max': 0.0}, {'current': 3304.846, 'min': 0.0, 'max': 0.0}, {'current': 3017.308, 'min': 0.0, 'max': 0.0}, {'current': 3299.861, 'min': 0.0, 'max': 0.0}, {'current': 2977.232, 'min': 0.0, 'max': 0.0}, {'current': 2939.823, 'min': 0.0, 'max': 0.0}, {'current': 3300.543, 'min': 0.0, 'max': 0.0}, {'current': 3014.24, 'min': 0.0, 'max': 0.0}, {'current': 3299.908, 'min': 0.0, 'max': 0.0}, {'current': 3014.885, 'min': 0.0, 'max': 0.0}, {'current': 3297.521, 'min': 0.0, 'max': 0.0}, {'current': 3296.848, 'min': 0.0, 'max': 0.0}, {'current': 3297.858, 'min': 0.0, 'max': 0.0}, {'current': 3296.813, 'min': 0.0, 'max': 0.0}, {'current': 2998.973, 'min': 0.0, 'max': 0.0}, {'current': 3299.759, 'min': 0.0, 'max': 0.0}, {'current': 3026.427, 'min': 0.0, 'max': 0.0}, {'current': 3300.35, 'min': 0.0, 'max': 0.0}, {'current': 2507.162, 'min': 0.0, 'max': 0.0}, {'current': 3250.875, 'min': 0.0, 'max': 0.0}, {'current': 3299.582, 'min': 0.0, 'max': 0.0}, {'current': 3299.791, 'min': 0.0, 'max': 0.0}, {'current': 2876.895, 'min': 0.0, 'max': 0.0}, {'current': 3300.637, 'min': 0.0, 'max': 0.0}, {'current': 3299.935, 'min': 0.0, 'max': 0.0}, {'current': 3299.409, 'min': 0.0, 'max': 0.0}, {'current': 3299.545, 'min': 0.0, 'max': 0.0}, {'current': 2845.582, 'min': 0.0, 'max': 0.0}, {'current': 3298.789, 'min': 0.0, 'max': 0.0}, {'current': 3212.048, 'min': 0.0, 'max': 0.0}, {'current': 2598.735, 'min': 0.0, 'max': 0.0}, {'current': 3299.632, 'min': 0.0, 'max': 0.0}, {'current': 3299.179, 'min': 0.0, 'max': 0.0}, {'current': 3298.805, 'min': 0.0, 'max': 0.0}, {'current': 3296.982, 'min': 0.0, 'max': 0.0}, {'current': 2498.549, 'min': 0.0, 'max': 0.0}, {'current': 3296.222, 'min': 0.0, 'max': 0.0}, {'current': 3297.448, 'min': 0.0, 'max': 0.0}, {'current': 2830.786, 'min': 0.0, 'max': 0.0}, {'current': 3299.116, 'min': 0.0, 'max': 0.0}, {'current': 3299.39, 'min': 0.0, 'max': 0.0}, {'current': 3299.373, 'min': 0.0, 'max': 0.0}], 'disk': {'/': {'total': 32.0, 'used': 0.012481689453125}}, 'gpu': 'NVIDIA A10G', 'gpu_count': 8, 'gpu_devices': [{'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}, {'name': 'NVIDIA A10G', 'memory_total': 24146608128}], 'memory': {'total': 747.9597625732422}}
2024-02-08 17:52:38,634 INFO    HandlerThread:1317 [system_monitor.py:probe():224] Finished collecting system info
2024-02-08 17:52:38,634 INFO    HandlerThread:1317 [system_monitor.py:probe():227] Publishing system info
2024-02-08 17:52:38,634 DEBUG   HandlerThread:1317 [system_info.py:_save_pip():52] Saving list of pip packages installed into the current environment
2024-02-08 17:52:38,634 DEBUG   HandlerThread:1317 [system_info.py:_save_pip():68] Saving pip packages done
2024-02-08 17:52:38,634 DEBUG   HandlerThread:1317 [system_info.py:_save_conda():75] Saving list of conda packages installed into the current environment
2024-02-08 17:52:39,456 INFO    Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/conda-environment.yaml
2024-02-08 17:52:39,457 INFO    Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/requirements.txt
2024-02-08 17:52:52,948 DEBUG   HandlerThread:1317 [system_info.py:_save_conda():87] Saving conda packages done
2024-02-08 17:52:52,950 INFO    HandlerThread:1317 [system_monitor.py:probe():229] Finished publishing system info
2024-02-08 17:52:52,954 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 17:52:52,954 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 17:52:52,954 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 17:52:52,954 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: keepalive
2024-02-08 17:52:52,955 DEBUG   SenderThread:1317 [sender.py:send():382] send: files
2024-02-08 17:52:52,955 INFO    SenderThread:1317 [sender.py:_save_file():1392] saving file wandb-metadata.json with policy now
2024-02-08 17:52:52,961 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: stop_status
2024-02-08 17:52:52,962 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: stop_status
2024-02-08 17:52:52,964 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 17:52:53,118 DEBUG   SenderThread:1317 [sender.py:send():382] send: telemetry
2024-02-08 17:52:53,118 DEBUG   SenderThread:1317 [sender.py:send():382] send: config
2024-02-08 17:52:53,118 DEBUG   SenderThread:1317 [sender.py:send():382] send: metric
2024-02-08 17:52:53,118 DEBUG   SenderThread:1317 [sender.py:send():382] send: telemetry
2024-02-08 17:52:53,119 DEBUG   SenderThread:1317 [sender.py:send():382] send: metric
2024-02-08 17:52:53,119 WARNING SenderThread:1317 [sender.py:send_metric():1343] Seen metric with glob (shouldn't happen)
2024-02-08 17:52:53,356 INFO    wandb-upload_0:1317 [upload_job.py:push():131] Uploaded file /tmp/tmpftpllcuxwandb/1bgc597r-wandb-metadata.json
2024-02-08 17:52:53,459 INFO    Thread-12 :1317 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/conda-environment.yaml
2024-02-08 17:52:53,459 INFO    Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-metadata.json
2024-02-08 17:52:53,459 INFO    Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log
2024-02-08 17:52:53,833 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 17:52:55,459 INFO    Thread-12 :1317 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log
2024-02-08 17:52:55,914 DEBUG   SenderThread:1317 [sender.py:send():382] send: exit
2024-02-08 17:52:55,914 INFO    SenderThread:1317 [sender.py:send_exit():589] handling exit code: 1
2024-02-08 17:52:55,914 INFO    SenderThread:1317 [sender.py:send_exit():591] handling runtime: 17
2024-02-08 17:52:55,915 INFO    SenderThread:1317 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 17:52:55,915 INFO    SenderThread:1317 [sender.py:send_exit():597] send defer
2024-02-08 17:52:55,915 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,915 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 0
2024-02-08 17:52:55,916 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,916 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 0
2024-02-08 17:52:55,916 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 1
2024-02-08 17:52:55,916 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,916 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 1
2024-02-08 17:52:55,916 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,916 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 1
2024-02-08 17:52:55,916 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 2
2024-02-08 17:52:55,916 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,916 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 2
2024-02-08 17:52:55,916 INFO    HandlerThread:1317 [system_monitor.py:finish():203] Stopping system monitor
2024-02-08 17:52:55,917 INFO    HandlerThread:1317 [interfaces.py:finish():202] Joined cpu monitor
2024-02-08 17:52:55,917 INFO    HandlerThread:1317 [interfaces.py:finish():202] Joined disk monitor
2024-02-08 17:52:55,918 DEBUG   SystemMonitor:1317 [system_monitor.py:_start():172] Starting system metrics aggregation loop
2024-02-08 17:52:55,918 DEBUG   SystemMonitor:1317 [system_monitor.py:_start():179] Finished system metrics aggregation loop
2024-02-08 17:52:55,918 DEBUG   SystemMonitor:1317 [system_monitor.py:_start():183] Publishing last batch of metrics
2024-02-08 17:52:55,956 INFO    HandlerThread:1317 [interfaces.py:finish():202] Joined gpu monitor
2024-02-08 17:52:55,956 INFO    HandlerThread:1317 [interfaces.py:finish():202] Joined memory monitor
2024-02-08 17:52:55,956 INFO    HandlerThread:1317 [interfaces.py:finish():202] Joined network monitor
2024-02-08 17:52:55,957 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,957 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 2
2024-02-08 17:52:55,957 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 3
2024-02-08 17:52:55,957 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,958 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 3
2024-02-08 17:52:55,958 DEBUG   SenderThread:1317 [sender.py:send():382] send: stats
2024-02-08 17:52:55,959 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,959 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 3
2024-02-08 17:52:55,959 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 4
2024-02-08 17:52:55,959 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,959 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 4
2024-02-08 17:52:55,959 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,959 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 4
2024-02-08 17:52:55,959 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 5
2024-02-08 17:52:55,959 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,959 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 5
2024-02-08 17:52:55,960 DEBUG   SenderThread:1317 [sender.py:send():382] send: summary
2024-02-08 17:52:55,961 INFO    SenderThread:1317 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
2024-02-08 17:52:55,961 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,961 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 5
2024-02-08 17:52:55,961 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 6
2024-02-08 17:52:55,961 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:55,961 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 6
2024-02-08 17:52:55,961 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:55,961 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 6
2024-02-08 17:52:55,966 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: status_report
2024-02-08 17:52:56,102 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 7
2024-02-08 17:52:56,102 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:56,102 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 7
2024-02-08 17:52:56,103 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:56,103 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 7
2024-02-08 17:52:56,459 INFO    Thread-12 :1317 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/config.yaml
2024-02-08 17:52:56,459 INFO    Thread-12 :1317 [dir_watcher.py:_on_file_created():271] file/dir created: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-summary.json
2024-02-08 17:52:56,914 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 17:52:57,129 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 8
2024-02-08 17:52:57,129 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 17:52:57,130 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:57,130 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 8
2024-02-08 17:52:57,130 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:57,130 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 8
2024-02-08 17:52:57,130 INFO    SenderThread:1317 [job_builder.py:build():298] Attempting to build job artifact
2024-02-08 17:52:57,131 INFO    SenderThread:1317 [job_builder.py:_get_source_type():439] no source found
2024-02-08 17:52:57,131 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 9
2024-02-08 17:52:57,131 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:57,131 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 9
2024-02-08 17:52:57,132 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:57,132 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 9
2024-02-08 17:52:57,132 INFO    SenderThread:1317 [dir_watcher.py:finish():358] shutting down directory watcher
2024-02-08 17:52:57,460 INFO    Thread-12 :1317 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log
2024-02-08 17:52:57,460 INFO    SenderThread:1317 [dir_watcher.py:finish():388] scan: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files
2024-02-08 17:52:57,460 INFO    SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/config.yaml config.yaml
2024-02-08 17:52:57,460 INFO    SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/requirements.txt requirements.txt
2024-02-08 17:52:57,460 INFO    SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/conda-environment.yaml conda-environment.yaml
2024-02-08 17:52:57,461 INFO    SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-metadata.json wandb-metadata.json
2024-02-08 17:52:57,461 INFO    SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log output.log
2024-02-08 17:52:57,463 INFO    SenderThread:1317 [dir_watcher.py:finish():402] scan save: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-summary.json wandb-summary.json
2024-02-08 17:52:57,464 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 10
2024-02-08 17:52:57,467 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:57,467 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 10
2024-02-08 17:52:57,468 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:57,468 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 10
2024-02-08 17:52:57,468 INFO    SenderThread:1317 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 17:52:57,674 INFO    wandb-upload_0:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/requirements.txt
2024-02-08 17:52:57,753 INFO    wandb-upload_1:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/config.yaml
2024-02-08 17:52:57,791 INFO    wandb-upload_3:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/output.log
2024-02-08 17:52:57,800 INFO    wandb-upload_2:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/conda-environment.yaml
2024-02-08 17:52:57,804 INFO    wandb-upload_4:1317 [upload_job.py:push():131] Uploaded file /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/files/wandb-summary.json
2024-02-08 17:52:57,915 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 17:52:57,915 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 17:52:58,004 INFO    Thread-11 (_thread_body):1317 [sender.py:transition_state():617] send defer: 11
2024-02-08 17:52:58,004 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:58,004 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 11
2024-02-08 17:52:58,005 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:58,005 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 11
2024-02-08 17:52:58,005 INFO    SenderThread:1317 [file_pusher.py:join():181] waiting for file pusher
2024-02-08 17:52:58,005 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 12
2024-02-08 17:52:58,005 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:58,005 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 12
2024-02-08 17:52:58,006 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:58,006 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 12
2024-02-08 17:52:58,006 INFO    SenderThread:1317 [file_stream.py:finish():595] file stream finish called
2024-02-08 17:52:58,071 INFO    SenderThread:1317 [file_stream.py:finish():599] file stream finish is done
2024-02-08 17:52:58,071 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 13
2024-02-08 17:52:58,071 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:58,071 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 13
2024-02-08 17:52:58,071 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:58,071 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 13
2024-02-08 17:52:58,071 INFO    SenderThread:1317 [sender.py:transition_state():617] send defer: 14
2024-02-08 17:52:58,071 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: defer
2024-02-08 17:52:58,071 INFO    HandlerThread:1317 [handler.py:handle_request_defer():172] handle defer: 14
2024-02-08 17:52:58,072 DEBUG   SenderThread:1317 [sender.py:send():382] send: final
2024-02-08 17:52:58,072 DEBUG   SenderThread:1317 [sender.py:send():382] send: footer
2024-02-08 17:52:58,072 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: defer
2024-02-08 17:52:58,072 INFO    SenderThread:1317 [sender.py:send_request_defer():613] handle sender defer: 14
2024-02-08 17:52:58,072 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 17:52:58,072 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 17:52:58,073 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: poll_exit
2024-02-08 17:52:58,073 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: poll_exit
2024-02-08 17:52:58,073 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: server_info
2024-02-08 17:52:58,073 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: server_info
2024-02-08 17:52:58,075 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: get_summary
2024-02-08 17:52:58,075 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: sampled_history
2024-02-08 17:52:58,076 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: internal_messages
2024-02-08 17:52:58,076 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: job_info
2024-02-08 17:52:58,121 DEBUG   SenderThread:1317 [sender.py:send_request():409] send_request: job_info
2024-02-08 17:52:58,122 INFO    MainThread:1317 [wandb_run.py:_footer_history_summary_info():3837] rendering history
2024-02-08 17:52:58,122 INFO    MainThread:1317 [wandb_run.py:_footer_history_summary_info():3869] rendering summary
2024-02-08 17:52:58,122 INFO    MainThread:1317 [wandb_run.py:_footer_sync_info():3796] logging synced files
2024-02-08 17:52:58,122 DEBUG   HandlerThread:1317 [handler.py:handle_request():146] handle_request: shutdown
2024-02-08 17:52:58,122 INFO    HandlerThread:1317 [handler.py:finish():866] shutting down handler
2024-02-08 17:52:59,076 INFO    WriterThread:1317 [datastore.py:close():294] close: /home/sagemaker-user/output-7b-26k-lora/wandb/run-20240208_175238-v53k76w9/run-v53k76w9.wandb
2024-02-08 17:52:59,122 INFO    SenderThread:1317 [sender.py:finish():1548] shutting down sender
2024-02-08 17:52:59,122 INFO    SenderThread:1317 [file_pusher.py:finish():175] shutting down file pusher
2024-02-08 17:52:59,122 INFO    SenderThread:1317 [file_pusher.py:join():181] waiting for file pusher