distily_bench_obj_cross_v2 / logs /attn_loss_fn=None, attn_weight=0, gradient_accumulation_steps=1, hs_loss_fn=mse, hs_weight=2.0, learning_rate=0.0004, lr_scheduler_type=cosine_with_restarts, max_grad_norm=None, num_cycles=4, optim=pa

This model has 1 file scanned as unsafe.

lapp0's picture
Training in progress, step 12375
9b0a1fe verified