What format is this compiled under?
your model
DEBUG:root:Iter 1: Val loss 3.559, Val took 22.742s
DEBUG:root:Iter 10: Train loss 3.465, It/sec 0.671, Tokens/sec 126.654
DEBUG:root:Iter 20: Train loss 3.460, It/sec 0.626, Tokens/sec 111.368
DEBUG:root:Iter 30: Train loss 3.137, It/sec 0.651, Tokens/sec 120.751
DEBUG:root:Iter 40: Train loss 2.746, It/sec 0.581, Tokens/sec 111.607
DEBUG:root:Iter 50: Train loss 2.529, It/sec 0.636, Tokens/sec 105.875
DEBUG:root:Iter 60: Train loss 2.651, It/sec 0.609, Tokens/sec 109.633
DEBUG:root:Iter 70: Train loss 2.638, It/sec 0.632, Tokens/sec 106.885
DEBUG:root:Iter 80: Train loss 2.703, It/sec 0.633, Tokens/sec 115.000
DEBUG:root:Iter 90: Train loss 2.491, It/sec 0.482, Tokens/sec 98.055
DEBUG:root:Iter 100: Train loss 2.583, It/sec 0.568, Tokens/sec 114.182
DEBUG:root:Iter 100: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 110: Train loss 2.642, It/sec 0.545, Tokens/sec 100.620
DEBUG:root:Iter 120: Train loss 2.588, It/sec 0.637, Tokens/sec 114.419
DEBUG:root:Iter 130: Train loss 2.645, It/sec 0.660, Tokens/sec 109.543
DEBUG:root:Iter 140: Train loss 2.634, It/sec 0.589, Tokens/sec 115.651
DEBUG:root:Iter 150: Train loss 2.517, It/sec 0.741, Tokens/sec 119.367
DEBUG:root:Iter 160: Train loss 2.686, It/sec 0.631, Tokens/sec 110.660
DEBUG:root:Iter 170: Train loss 2.291, It/sec 0.640, Tokens/sec 112.617
DEBUG:root:Iter 180: Train loss 2.344, It/sec 0.680, Tokens/sec 111.138
DEBUG:root:Iter 190: Train loss 2.488, It/sec 0.726, Tokens/sec 119.050
DEBUG:root:Iter 200: Train loss 2.454, It/sec 0.623, Tokens/sec 110.476
DEBUG:root:Iter 200: Val loss 2.304, Val took 22.077s
llama-8B-Instruct converted using mlx-examples
DEBUG:root:Iter 1: Val loss 6.291, Val took 2.289s
DEBUG:root:Iter 10: Train loss 5.658, It/sec 5.986, Tokens/sec 23.945
DEBUG:root:Iter 20: Train loss 2.697, It/sec 5.338, Tokens/sec 21.351
DEBUG:root:Iter 30: Train loss 1.739, It/sec 5.375, Tokens/sec 21.499
DEBUG:root:Iter 40: Train loss 1.378, It/sec 5.369, Tokens/sec 21.474
DEBUG:root:Iter 50: Train loss 0.940, It/sec 5.367, Tokens/sec 21.467
DEBUG:root:Iter 60: Train loss 1.172, It/sec 5.345, Tokens/sec 21.379
DEBUG:root:Iter 70: Train loss 0.845, It/sec 5.342, Tokens/sec 21.370
DEBUG:root:Iter 80: Train loss 0.784, It/sec 5.314, Tokens/sec 21.258
DEBUG:root:Iter 90: Train loss 1.071, It/sec 5.334, Tokens/sec 21.337
DEBUG:root:Iter 100: Train loss 1.362, It/sec 5.306, Tokens/sec 21.223
DEBUG:root:Iter 100: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 110: Train loss 1.340, It/sec 5.252, Tokens/sec 21.008
DEBUG:root:Iter 120: Train loss 1.407, It/sec 5.323, Tokens/sec 21.292
DEBUG:root:Iter 130: Train loss 1.119, It/sec 5.338, Tokens/sec 21.352
DEBUG:root:Iter 140: Train loss 1.054, It/sec 5.360, Tokens/sec 21.441
DEBUG:root:Iter 150: Train loss 0.864, It/sec 5.366, Tokens/sec 21.463
DEBUG:root:Iter 160: Train loss 0.805, It/sec 5.360, Tokens/sec 21.438
DEBUG:root:Iter 170: Train loss 0.883, It/sec 5.366, Tokens/sec 21.463
DEBUG:root:Iter 180: Train loss 1.030, It/sec 5.360, Tokens/sec 21.440
DEBUG:root:Iter 190: Train loss 1.338, It/sec 5.316, Tokens/sec 21.265
DEBUG:root:Iter 200: Train loss 0.919, It/sec 5.366, Tokens/sec 21.464
DEBUG:root:Iter 200: Val loss 1.229, Val took 2.304s
DEBUG:root:Iter 200: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 210: Train loss 1.195, It/sec 5.336, Tokens/sec 21.342
DEBUG:root:Iter 220: Train loss 1.212, It/sec 5.364, Tokens/sec 21.457
DEBUG:root:Iter 230: Train loss 0.943, It/sec 5.336, Tokens/sec 21.344
DEBUG:root:Iter 240: Train loss 1.271, It/sec 5.343, Tokens/sec 21.373
DEBUG:root:Iter 250: Train loss 0.933, It/sec 5.350, Tokens/sec 21.401
DEBUG:root:Iter 260: Train loss 1.032, It/sec 5.362, Tokens/sec 21.448
DEBUG:root:Iter 270: Train loss 1.674, It/sec 5.361, Tokens/sec 21.444
DEBUG:root:Iter 280: Train loss 0.728, It/sec 5.344, Tokens/sec 21.378
DEBUG:root:Iter 290: Train loss 1.426, It/sec 5.362, Tokens/sec 21.448
DEBUG:root:Iter 300: Train loss 0.931, It/sec 5.366, Tokens/sec 21.465
DEBUG:root:Iter 300: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 310: Train loss 1.212, It/sec 5.338, Tokens/sec 21.353
DEBUG:root:Iter 320: Train loss 1.543, It/sec 5.357, Tokens/sec 21.429
DEBUG:root:Iter 330: Train loss 1.125, It/sec 5.358, Tokens/sec 21.434
DEBUG:root:Iter 340: Train loss 0.940, It/sec 5.365, Tokens/sec 21.462
DEBUG:root:Iter 350: Train loss 0.907, It/sec 5.364, Tokens/sec 21.458
DEBUG:root:Iter 360: Train loss 1.013, It/sec 5.355, Tokens/sec 21.419
DEBUG:root:Iter 370: Train loss 0.795, It/sec 5.324, Tokens/sec 21.295
DEBUG:root:Iter 380: Train loss 0.911, It/sec 5.358, Tokens/sec 21.431
DEBUG:root:Iter 390: Train loss 0.806, It/sec 5.334, Tokens/sec 21.334
DEBUG:root:Iter 400: Train loss 0.965, It/sec 5.364, Tokens/sec 21.455
DEBUG:root:Iter 400: Val loss 1.094, Val took 2.298s
DEBUG:root:Iter 400: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 410: Train loss 0.892, It/sec 5.341, Tokens/sec 21.362
DEBUG:root:Iter 420: Train loss 0.937, It/sec 5.374, Tokens/sec 21.494
DEBUG:root:Iter 430: Train loss 1.024, It/sec 5.371, Tokens/sec 21.484
DEBUG:root:Iter 440: Train loss 1.091, It/sec 5.372, Tokens/sec 21.490
DEBUG:root:Iter 450: Train loss 1.095, It/sec 5.367, Tokens/sec 21.466
DEBUG:root:Iter 460: Train loss 1.113, It/sec 5.367, Tokens/sec 21.468
DEBUG:root:Iter 470: Train loss 1.000, It/sec 5.376, Tokens/sec 21.503
DEBUG:root:Iter 480: Train loss 1.160, It/sec 5.368, Tokens/sec 21.474
DEBUG:root:Iter 490: Train loss 0.959, It/sec 5.371, Tokens/sec 21.482
DEBUG:root:Iter 500: Train loss 0.979, It/sec 5.375, Tokens/sec 21.501
DEBUG:root:Iter 500: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 510: Train loss 1.156, It/sec 5.346, Tokens/sec 21.383
DEBUG:root:Iter 520: Train loss 0.998, It/sec 5.377, Tokens/sec 21.508
DEBUG:root:Iter 530: Train loss 1.140, It/sec 5.375, Tokens/sec 21.502
DEBUG:root:Iter 540: Train loss 1.027, It/sec 5.373, Tokens/sec 21.494
DEBUG:root:Iter 550: Train loss 1.285, It/sec 5.333, Tokens/sec 21.332
DEBUG:root:Iter 560: Train loss 1.446, It/sec 5.362, Tokens/sec 21.449
DEBUG:root:Iter 570: Train loss 1.094, It/sec 5.376, Tokens/sec 21.504
DEBUG:root:Iter 580: Train loss 1.279, It/sec 5.367, Tokens/sec 21.468
DEBUG:root:Iter 590: Train loss 0.925, It/sec 5.364, Tokens/sec 21.455
DEBUG:root:Iter 600: Train loss 1.319, It/sec 5.369, Tokens/sec 21.474
DEBUG:root:Iter 600: Val loss 1.076, Val took 2.295s
DEBUG:root:Iter 600: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 610: Train loss 0.857, It/sec 5.343, Tokens/sec 21.371
DEBUG:root:Iter 620: Train loss 1.449, It/sec 5.373, Tokens/sec 21.493
DEBUG:root:Iter 630: Train loss 0.749, It/sec 5.362, Tokens/sec 21.447
DEBUG:root:Iter 640: Train loss 1.190, It/sec 5.371, Tokens/sec 21.484
DEBUG:root:Iter 650: Train loss 0.962, It/sec 5.368, Tokens/sec 21.471
DEBUG:root:Iter 660: Train loss 0.958, It/sec 5.359, Tokens/sec 21.435
DEBUG:root:Iter 670: Train loss 1.378, It/sec 5.363, Tokens/sec 21.451
DEBUG:root:Iter 680: Train loss 1.180, It/sec 5.375, Tokens/sec 21.499
DEBUG:root:Iter 690: Train loss 1.184, It/sec 5.344, Tokens/sec 21.375
DEBUG:root:Iter 700: Train loss 1.344, It/sec 5.364, Tokens/sec 21.457
DEBUG:root:Iter 700: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 710: Train loss 1.048, It/sec 5.348, Tokens/sec 21.392
DEBUG:root:Iter 720: Train loss 0.909, It/sec 5.375, Tokens/sec 21.499
DEBUG:root:Iter 730: Train loss 0.805, It/sec 5.334, Tokens/sec 21.338
DEBUG:root:Iter 740: Train loss 1.356, It/sec 5.372, Tokens/sec 21.486
DEBUG:root:Iter 750: Train loss 1.170, It/sec 5.373, Tokens/sec 21.491
DEBUG:root:Iter 760: Train loss 0.899, It/sec 5.373, Tokens/sec 21.492
DEBUG:root:Iter 770: Train loss 0.930, It/sec 5.370, Tokens/sec 21.480
DEBUG:root:Iter 780: Train loss 0.755, It/sec 5.392, Tokens/sec 21.569
DEBUG:root:Iter 790: Train loss 1.149, It/sec 5.382, Tokens/sec 21.529
DEBUG:root:Iter 800: Train loss 1.041, It/sec 5.362, Tokens/sec 21.448
DEBUG:root:Iter 800: Val loss 1.078, Val took 2.298s
DEBUG:root:Iter 800: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 810: Train loss 0.850, It/sec 5.343, Tokens/sec 21.371
DEBUG:root:Iter 820: Train loss 1.292, It/sec 5.373, Tokens/sec 21.492
DEBUG:root:Iter 830: Train loss 0.888, It/sec 5.365, Tokens/sec 21.460
DEBUG:root:Iter 840: Train loss 1.159, It/sec 5.370, Tokens/sec 21.479
DEBUG:root:Iter 850: Train loss 1.118, It/sec 5.370, Tokens/sec 21.481
DEBUG:root:Iter 860: Train loss 1.242, It/sec 5.362, Tokens/sec 21.447
DEBUG:root:Iter 870: Train loss 0.836, It/sec 5.374, Tokens/sec 21.495
DEBUG:root:Iter 880: Train loss 0.866, It/sec 5.373, Tokens/sec 21.492
DEBUG:root:Iter 890: Train loss 1.238, It/sec 5.357, Tokens/sec 21.429
DEBUG:root:Iter 900: Train loss 0.952, It/sec 5.326, Tokens/sec 21.304
DEBUG:root:Iter 900: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 910: Train loss 0.919, It/sec 5.348, Tokens/sec 21.392
DEBUG:root:Iter 920: Train loss 0.893, It/sec 5.372, Tokens/sec 21.489
DEBUG:root:Iter 930: Train loss 0.893, It/sec 5.371, Tokens/sec 21.486
DEBUG:root:Iter 940: Train loss 1.402, It/sec 5.365, Tokens/sec 21.459
DEBUG:root:Iter 950: Train loss 1.204, It/sec 5.367, Tokens/sec 21.468
DEBUG:root:Iter 960: Train loss 1.019, It/sec 5.376, Tokens/sec 21.505
DEBUG:root:Iter 970: Train loss 1.098, It/sec 5.370, Tokens/sec 21.479
DEBUG:root:Iter 980: Train loss 1.058, It/sec 5.378, Tokens/sec 21.510
DEBUG:root:Iter 990: Train loss 1.048, It/sec 5.370, Tokens/sec 21.479
DEBUG:root:Iter 1000: Train loss 0.902, It/sec 5.359, Tokens/sec 21.436
DEBUG:root:Iter 1000: Val loss 1.095, Val took 2.294s
DEBUG:root:Iter 1000: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1010: Train loss 0.964, It/sec 5.339, Tokens/sec 21.357
DEBUG:root:Iter 1020: Train loss 0.767, It/sec 5.371, Tokens/sec 21.486
DEBUG:root:Iter 1030: Train loss 1.112, It/sec 5.377, Tokens/sec 21.506
DEBUG:root:Iter 1040: Train loss 1.344, It/sec 5.375, Tokens/sec 21.500
DEBUG:root:Iter 1050: Train loss 0.874, It/sec 5.378, Tokens/sec 21.512
DEBUG:root:Iter 1060: Train loss 0.939, It/sec 5.379, Tokens/sec 21.514
DEBUG:root:Iter 1070: Train loss 1.090, It/sec 5.381, Tokens/sec 21.523
DEBUG:root:Iter 1080: Train loss 0.819, It/sec 5.375, Tokens/sec 21.500
DEBUG:root:Iter 1090: Train loss 1.137, It/sec 5.327, Tokens/sec 21.306
DEBUG:root:Iter 1100: Train loss 1.231, It/sec 5.363, Tokens/sec 21.453
DEBUG:root:Iter 1100: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1110: Train loss 1.021, It/sec 5.346, Tokens/sec 21.383
DEBUG:root:Iter 1120: Train loss 1.075, It/sec 5.365, Tokens/sec 21.459
DEBUG:root:Iter 1130: Train loss 1.429, It/sec 5.374, Tokens/sec 21.497
DEBUG:root:Iter 1140: Train loss 0.886, It/sec 5.363, Tokens/sec 21.451
DEBUG:root:Iter 1150: Train loss 0.935, It/sec 5.374, Tokens/sec 21.494
DEBUG:root:Iter 1160: Train loss 1.338, It/sec 5.376, Tokens/sec 21.503
DEBUG:root:Iter 1170: Train loss 0.860, It/sec 5.370, Tokens/sec 21.479
DEBUG:root:Iter 1180: Train loss 1.156, It/sec 5.375, Tokens/sec 21.499
DEBUG:root:Iter 1190: Train loss 0.940, It/sec 5.379, Tokens/sec 21.515
DEBUG:root:Iter 1200: Train loss 1.042, It/sec 5.364, Tokens/sec 21.456
DEBUG:root:Iter 1200: Val loss 1.086, Val took 2.295s
DEBUG:root:Iter 1200: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1210: Train loss 0.731, It/sec 5.343, Tokens/sec 21.370
DEBUG:root:Iter 1220: Train loss 0.783, It/sec 5.368, Tokens/sec 21.473
DEBUG:root:Iter 1230: Train loss 0.991, It/sec 5.373, Tokens/sec 21.492
DEBUG:root:Iter 1240: Train loss 1.520, It/sec 5.371, Tokens/sec 21.486
DEBUG:root:Iter 1250: Train loss 1.464, It/sec 5.373, Tokens/sec 21.492
DEBUG:root:Iter 1260: Train loss 0.927, It/sec 5.313, Tokens/sec 21.250
DEBUG:root:Iter 1270: Train loss 0.712, It/sec 5.366, Tokens/sec 21.464
DEBUG:root:Iter 1280: Train loss 0.962, It/sec 5.375, Tokens/sec 21.500
DEBUG:root:Iter 1290: Train loss 1.195, It/sec 5.374, Tokens/sec 21.497
DEBUG:root:Iter 1300: Train loss 0.993, It/sec 5.350, Tokens/sec 21.401
DEBUG:root:Iter 1300: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1310: Train loss 0.821, It/sec 5.356, Tokens/sec 21.424
DEBUG:root:Iter 1320: Train loss 1.098, It/sec 5.366, Tokens/sec 21.466
DEBUG:root:Iter 1330: Train loss 0.969, It/sec 5.375, Tokens/sec 21.498
DEBUG:root:Iter 1340: Train loss 0.710, It/sec 5.365, Tokens/sec 21.460
DEBUG:root:Iter 1350: Train loss 0.782, It/sec 5.374, Tokens/sec 21.496
DEBUG:root:Iter 1360: Train loss 1.024, It/sec 5.375, Tokens/sec 21.500
DEBUG:root:Iter 1370: Train loss 1.628, It/sec 5.377, Tokens/sec 21.509
DEBUG:root:Iter 1380: Train loss 0.807, It/sec 5.376, Tokens/sec 21.502
DEBUG:root:Iter 1390: Train loss 1.416, It/sec 5.374, Tokens/sec 21.497
DEBUG:root:Iter 1400: Train loss 1.020, It/sec 5.376, Tokens/sec 21.503
DEBUG:root:Iter 1400: Val loss 1.135, Val took 2.296s
DEBUG:root:Iter 1400: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1410: Train loss 1.263, It/sec 5.341, Tokens/sec 21.363
DEBUG:root:Iter 1420: Train loss 1.260, It/sec 5.369, Tokens/sec 21.476
DEBUG:root:Iter 1430: Train loss 0.972, It/sec 5.377, Tokens/sec 21.507
DEBUG:root:Iter 1440: Train loss 1.162, It/sec 5.322, Tokens/sec 21.287
DEBUG:root:Iter 1450: Train loss 1.088, It/sec 5.373, Tokens/sec 21.493
DEBUG:root:Iter 1460: Train loss 1.133, It/sec 5.355, Tokens/sec 21.419
DEBUG:root:Iter 1470: Train loss 1.069, It/sec 5.373, Tokens/sec 21.491
DEBUG:root:Iter 1480: Train loss 0.785, It/sec 5.374, Tokens/sec 21.495
DEBUG:root:Iter 1490: Train loss 1.123, It/sec 5.374, Tokens/sec 21.496
DEBUG:root:Iter 1500: Train loss 1.520, It/sec 5.368, Tokens/sec 21.472
DEBUG:root:Iter 1500: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1510: Train loss 1.180, It/sec 5.341, Tokens/sec 21.363
DEBUG:root:Iter 1520: Train loss 0.879, It/sec 5.370, Tokens/sec 21.479
DEBUG:root:Iter 1530: Train loss 1.086, It/sec 5.376, Tokens/sec 21.503
DEBUG:root:Iter 1540: Train loss 0.965, It/sec 5.373, Tokens/sec 21.493
DEBUG:root:Iter 1550: Train loss 0.879, It/sec 5.371, Tokens/sec 21.485
DEBUG:root:Iter 1560: Train loss 0.791, It/sec 5.372, Tokens/sec 21.487
DEBUG:root:Iter 1570: Train loss 0.817, It/sec 5.373, Tokens/sec 21.490
DEBUG:root:Iter 1580: Train loss 1.046, It/sec 5.370, Tokens/sec 21.478
DEBUG:root:Iter 1590: Train loss 1.217, It/sec 5.371, Tokens/sec 21.485
DEBUG:root:Iter 1600: Train loss 0.952, It/sec 5.372, Tokens/sec 21.487
DEBUG:root:Iter 1600: Val loss 1.078, Val took 2.302s
DEBUG:root:Iter 1600: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1610: Train loss 0.835, It/sec 5.334, Tokens/sec 21.336
DEBUG:root:Iter 1620: Train loss 0.766, It/sec 5.329, Tokens/sec 21.318
DEBUG:root:Iter 1630: Train loss 1.527, It/sec 5.380, Tokens/sec 21.521
DEBUG:root:Iter 1640: Train loss 1.143, It/sec 5.372, Tokens/sec 21.487
DEBUG:root:Iter 1650: Train loss 1.025, It/sec 5.370, Tokens/sec 21.479
DEBUG:root:Iter 1660: Train loss 1.428, It/sec 5.372, Tokens/sec 21.487
DEBUG:root:Iter 1670: Train loss 0.919, It/sec 5.369, Tokens/sec 21.477
DEBUG:root:Iter 1680: Train loss 1.080, It/sec 5.377, Tokens/sec 21.507
DEBUG:root:Iter 1690: Train loss 1.513, It/sec 5.379, Tokens/sec 21.517
DEBUG:root:Iter 1700: Train loss 1.096, It/sec 5.371, Tokens/sec 21.483
DEBUG:root:Iter 1700: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1710: Train loss 1.054, It/sec 5.348, Tokens/sec 21.393
DEBUG:root:Iter 1720: Train loss 1.112, It/sec 5.374, Tokens/sec 21.494
DEBUG:root:Iter 1730: Train loss 1.082, It/sec 5.370, Tokens/sec 21.482
DEBUG:root:Iter 1740: Train loss 1.143, It/sec 5.357, Tokens/sec 21.430
DEBUG:root:Iter 1750: Train loss 0.643, It/sec 5.354, Tokens/sec 21.416
DEBUG:root:Iter 1760: Train loss 0.925, It/sec 5.371, Tokens/sec 21.484
DEBUG:root:Iter 1770: Train loss 0.905, It/sec 5.352, Tokens/sec 21.410
DEBUG:root:Iter 1780: Train loss 0.912, It/sec 5.366, Tokens/sec 21.463
DEBUG:root:Iter 1790: Train loss 0.844, It/sec 5.382, Tokens/sec 21.527
DEBUG:root:Iter 1800: Train loss 1.218, It/sec 5.372, Tokens/sec 21.487
DEBUG:root:Iter 1800: Val loss 1.098, Val took 2.302s
DEBUG:root:Iter 1800: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1810: Train loss 1.597, It/sec 5.303, Tokens/sec 21.211
DEBUG:root:Iter 1820: Train loss 1.547, It/sec 5.373, Tokens/sec 21.491
DEBUG:root:Iter 1830: Train loss 0.935, It/sec 5.375, Tokens/sec 21.499
DEBUG:root:Iter 1840: Train loss 1.218, It/sec 5.377, Tokens/sec 21.507
DEBUG:root:Iter 1850: Train loss 1.026, It/sec 5.370, Tokens/sec 21.478
DEBUG:root:Iter 1860: Train loss 0.694, It/sec 5.373, Tokens/sec 21.491
DEBUG:root:Iter 1870: Train loss 0.921, It/sec 5.349, Tokens/sec 21.397
DEBUG:root:Iter 1880: Train loss 1.160, It/sec 5.350, Tokens/sec 21.401
DEBUG:root:Iter 1890: Train loss 0.992, It/sec 5.361, Tokens/sec 21.443
DEBUG:root:Iter 1900: Train loss 1.059, It/sec 5.373, Tokens/sec 21.492
DEBUG:root:Iter 1900: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 1910: Train loss 0.890, It/sec 5.355, Tokens/sec 21.419
DEBUG:root:Iter 1920: Train loss 1.250, It/sec 5.347, Tokens/sec 21.386
DEBUG:root:Iter 1930: Train loss 0.743, It/sec 5.371, Tokens/sec 21.484
DEBUG:root:Iter 1940: Train loss 0.667, It/sec 5.373, Tokens/sec 21.493
DEBUG:root:Iter 1950: Train loss 1.039, It/sec 5.373, Tokens/sec 21.492
DEBUG:root:Iter 1960: Train loss 1.232, It/sec 5.374, Tokens/sec 21.494
DEBUG:root:Iter 1970: Train loss 1.127, It/sec 5.372, Tokens/sec 21.486
DEBUG:root:Iter 1980: Train loss 1.007, It/sec 5.360, Tokens/sec 21.440
DEBUG:root:Iter 1990: Train loss 1.244, It/sec 5.333, Tokens/sec 21.333
DEBUG:root:Iter 2000: Train loss 1.201, It/sec 5.377, Tokens/sec 21.506
DEBUG:root:Iter 2000: Val loss 1.097, Val took 2.296s
DEBUG:root:Iter 2000: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 2010: Train loss 0.749, It/sec 5.346, Tokens/sec 21.384
DEBUG:root:Iter 2020: Train loss 1.067, It/sec 5.364, Tokens/sec 21.455
DEBUG:root:Iter 2030: Train loss 1.068, It/sec 5.374, Tokens/sec 21.495
DEBUG:root:Iter 2040: Train loss 1.153, It/sec 5.378, Tokens/sec 21.512
DEBUG:root:Iter 2050: Train loss 0.711, It/sec 5.370, Tokens/sec 21.481
DEBUG:root:Iter 2060: Train loss 0.932, It/sec 5.372, Tokens/sec 21.486
DEBUG:root:Iter 2070: Train loss 1.541, It/sec 5.367, Tokens/sec 21.466
DEBUG:root:Iter 2080: Train loss 1.184, It/sec 5.374, Tokens/sec 21.498
DEBUG:root:Iter 2090: Train loss 1.108, It/sec 5.378, Tokens/sec 21.511
DEBUG:root:Iter 2100: Train loss 0.967, It/sec 5.374, Tokens/sec 21.496
DEBUG:root:Iter 2100: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 2110: Train loss 0.774, It/sec 5.338, Tokens/sec 21.351
DEBUG:root:Iter 2120: Train loss 0.714, It/sec 5.375, Tokens/sec 21.499
DEBUG:root:Iter 2130: Train loss 0.920, It/sec 5.374, Tokens/sec 21.494
DEBUG:root:Iter 2140: Train loss 0.944, It/sec 5.373, Tokens/sec 21.494
DEBUG:root:Iter 2150: Train loss 1.050, It/sec 5.377, Tokens/sec 21.507
DEBUG:root:Iter 2160: Train loss 1.109, It/sec 5.371, Tokens/sec 21.482
DEBUG:root:Iter 2170: Train loss 0.923, It/sec 5.380, Tokens/sec 21.519
DEBUG:root:Iter 2180: Train loss 1.199, It/sec 5.318, Tokens/sec 21.274
DEBUG:root:Iter 2190: Train loss 1.229, It/sec 5.372, Tokens/sec 21.488
DEBUG:root:Iter 2200: Train loss 0.729, It/sec 5.376, Tokens/sec 21.503
DEBUG:root:Iter 2200: Val loss 1.116, Val took 2.294s
DEBUG:root:Iter 2200: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 2210: Train loss 1.012, It/sec 5.336, Tokens/sec 21.343
DEBUG:root:Iter 2220: Train loss 1.001, It/sec 5.360, Tokens/sec 21.439
DEBUG:root:Iter 2230: Train loss 1.074, It/sec 5.380, Tokens/sec 21.521
DEBUG:root:Iter 2240: Train loss 1.205, It/sec 5.382, Tokens/sec 21.529
DEBUG:root:Iter 2250: Train loss 0.803, It/sec 5.374, Tokens/sec 21.498
DEBUG:root:Iter 2260: Train loss 1.316, It/sec 5.379, Tokens/sec 21.516
DEBUG:root:Iter 2270: Train loss 0.921, It/sec 5.379, Tokens/sec 21.514
DEBUG:root:Iter 2280: Train loss 0.964, It/sec 5.368, Tokens/sec 21.472
DEBUG:root:Iter 2290: Train loss 0.788, It/sec 5.394, Tokens/sec 21.577
DEBUG:root:Iter 2300: Train loss 1.231, It/sec 5.381, Tokens/sec 21.524
DEBUG:root:Iter 2300: Saved adapter weights to adapters.npz.
DEBUG:root:Iter 2310: Train loss 0.936, It/sec 5.352, Tokens/sec 21.407
DEBUG:root:Iter 2320: Train loss 1.225, It/sec 5.362, Tokens/sec 21.447
DEBUG:root:Iter 2330: Train loss 1.159, It/sec 5.354, Tokens/sec 21.416
DEBUG:root:Iter 2340: Train loss 1.203, It/sec 5.375, Tokens/sec 21.500
DEBUG:root:Iter 2350: Train loss 1.442, It/sec 5.374, Tokens/sec 21.494
DEBUG:root:Iter 2360: Train loss 0.872, It/sec 5.375, Tokens/sec 21.501
DEBUG:root:Iter 2370: Train loss 0.854, It/sec 5.376, Tokens/sec 21.502
DEBUG:root:Iter 2380: Train loss 1.188, It/sec 5.323, Tokens/sec 21.292
DEBUG:root:Iter 2390: Train loss 0.962, It/sec 5.391, Tokens/sec 21.562
DEBUG:root:Iter 2400: Train loss 0.732, It/sec 5.415, Tokens/sec 21.661
DEBUG:root:Iter 2400: Val loss 1.110, Val took 2.273s
DEBUG:root:Iter 2400: Saved adapter weights to adapters.npz.