BEE-spoke-data/smol_llama-101M-GQA
Text Generation
•
Updated
•
4.36k
•
26
small-scale pretraining experiments of mine
Note smol_llama-220M-GQA CPT on fineweb-edu for 10 billion tokens
Note this is a mid-training checkpoint of what is now smol_llama-220M