pszemraj/long-t5-tglobal-base-16384-book-summary

ccdv

Nov 15, 2022

Hey @pszemraj
Do you still have the training script of this model?

Thank you

Owner Nov 18, 2022

hey thanks for reaching out! I don't have anything on hand at the moment. I'll let you know if I dig through and find it, but essentially I used a variant of the longformer training notebook, key enabler being deepspeed.

deepspeed JSON

typically I use ZeRO-2 and roll with something like:

{
   "optimizer":{
      "type":"AdamW",
      "params":{
         "lr":"auto",
         "betas":"auto",
         "eps":"auto",
         "weight_decay":"auto"
      }
   },
   "zero_optimization":{
      "stage":2,
      "offload_optimizer":{
         "device":"cpu",
         "pin_memory":true
      },
      "allgather_partitions":true,
      "allgather_bucket_size":2e8,
      "overlap_comm":true,
      "reduce_scatter":true,
      "reduce_bucket_size":2e8,
      "round_robin_gradients":true,
      "contiguous_gradients":true
   },
   "bfloat16":{
      "enabled":"auto"
   },
   "gradient_accumulation_steps":"auto",
   "gradient_clipping":"auto",
   "steps_per_print":4000,
   "train_batch_size":"auto",
   "train_micro_batch_size_per_gpu":"auto",
   "wall_clock_breakdown":false
}

ccdv

Nov 20, 2022

ok thanks
Got the training done on 4096 length, will try up to 16384 tokens now.

pszemraj

Owner Dec 20, 2022

Hey, let me know if you have any other questions/issues with training. Either feel free to comment here/reopen, or message me on discord mrshadow773#0840 :)

pszemraj changed discussion status to closed Dec 20, 2022

pszemraj
/

long-t5-tglobal-base-16384-book-summary

Training script

deepspeed JSON