This checkpoint of the 1.3B GLA model used in the paper Gated Linear Attention. The model is trained with 100B tokens from the SlimPajama dataset tokenized with Llama2 tokenizer.

See the model and loading script in this repo.

Downloads last month: 13

Safetensors

Model size

1.37B params

Tensor type

F32

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

bailin28
/

gla-1B-100B

Dataset used to train bailin28/gla-1B-100B