license: mit | |
datasets: | |
- cerebras/SlimPajama-627B | |
language: | |
- en | |
This checkpoint of the 1.3B GLA model used in the paper [Gated Linear Attention](https://arxiv.org/abs/2312.06635). The model is trained with 100B tokens from the SlimPajama dataset tokenized with Llama2 tokenizer. | |
See the model and loading script in this [repo](https://github.com/berlino/gated_linear_attention). | |