--- datasets: - EleutherAI/pile language: - en --- Based model but uses layernorm instead of QK.sum(-1) for the normalization, for better hardware efficiency.