Training script

by ccdv - opened

Hey @pszemraj
Do you still have the training script of this model?

Thank you

hey thanks for reaching out! I don't have anything on hand at the moment. I'll let you know if I dig through and find it, but essentially I used a variant of the longformer training notebook, key enabler being deepspeed.

deepspeed JSON

typically I use ZeRO-2 and roll with something like:


ok thanks
Got the training done on 4096 length, will try up to 16384 tokens now.

Hey, let me know if you have any other questions/issues with training. Either feel free to comment here/reopen, or message me on discord mrshadow773#0840 :)

pszemraj changed discussion status to closed

Sign up or log in to comment