bailin28
/

gla-1B-100B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

gla-1B-100B / README.md

bailin28's picture

Update README.md

85e27ce verified 11 months ago

|

history blame contribute delete

392 Bytes

	---
	license: mit
	datasets:
	- cerebras/SlimPajama-627B
	language:
	- en
	---


	This checkpoint of the 1.3B GLA model used in the paper [Gated Linear Attention](https://arxiv.org/abs/2312.06635). The model is trained with 100B tokens from the SlimPajama dataset tokenized with Llama2 tokenizer.

	See the model and loading script in this [repo](https://github.com/berlino/gated_linear_attention).