metadata

license: apache-2.0

Grok-1

This repository contains the weights of the Grok-1 open-weights model.

                     ╔══════════════════════════╗
                     ║                 _______  ║
                     ║            /\   |_   _|  ║
                     ║  __  __   /  \    | |    ║
                     ║  \ \/ /  / /\ \   | |    ║
                     ║   >  <  / ____ \ _| |_   ║
                     ║  /_/\_\/_/    \_\_____|  ║
                     ║                          ║
                     ║  Understand the Universe ║
                     ║      [https://x.ai]      ║
                     ╚════════════╗╔════════════╝
                         ╔════════╝╚═════════╗
                         ║ xAI Grok-1 (314B) ║
                         ╚════════╗╔═════════╝
            ╔═════════════════════╝╚═════════════════════╗
            ║ 314B parameter Mixture of Experts model    ║
            ║ - Base model (not finetuned)               ║
            ║ - 8 experts (2 active)                     ║
            ║ - 86B active parameters                    ║
            ║ - Apache 2.0 license                       ║
            ║ - Code: https://github.com/xai-org/grok-1  ║
            ║ - Happy coding!                            ║
            ╚════════════════════════════════════════════╝

Model Configuration Details

Vocabulary Size: 131,072

Special Tokens:

Pad Token: 0
End of Sequence Token: 2

Sequence Length: 8192

Model Architecture: MoE

Embedding Size: 6,144
Layers: 64
Experts: 8
Selected Experts: 2
Widening Factor: 8
Key Size: 128
Query Heads: 48
Key Value Heads: 8
Activation Sharding: Data-wise, Model-wise

Inference Configuration:

Batch Size per Device: 0.125
Tokenizer: ./tokenizer.model
Local Mesh: 1x8
Between Hosts: 1x1

Inference Details

Make sure to download the int8 checkpoint to the checkpoints directory and run

pip install -r requirements.txt
python transformer.py

to test the code.

You should be seeing output from the language model.

Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code.

p.s. we're hiring: https://x.ai/career