HyperCLOVA X Technical Report
Paper
•
2404.01954
•
Published
•
19
Batches are grouped by similar token length to help optimize gpu/hardware. Mini batch lengths are different but the max number of tokens is the same.