File size: 392 Bytes
af49087
 
 
 
 
 
 
 
 
85e27ce
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
---
license: mit
datasets:
- cerebras/SlimPajama-627B
language:
- en
---


This checkpoint of the 1.3B GLA model used in the paper [Gated Linear Attention](https://arxiv.org/abs/2312.06635).  The model is trained with 100B tokens from the SlimPajama dataset tokenized with Llama2 tokenizer.

See the model and loading script in this [repo](https://github.com/berlino/gated_linear_attention).