view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7, 2024 • 10
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 14 days ago • 109