|
--- |
|
license: apache-2.0 |
|
--- |
|
# Grok-1 |
|
--- |
|
_This repository contains the weights of the Grok-1 open-weights model._ |
|
|
|
**To get started with using the model, follow the instructions at** `github.com/xai-org/grok.` |
|
|
|
|
|
 |
|
|
|
<small>The cover image was generated using [Midjourney](midjourney.com) based on the following prompt proposed by Grok: A 3D illustration of a neural network, with transparent nodes and glowing connections, showcasing the varying weights as different thicknesses and colors of the connecting lines.</small> |
|
|
|
--- |
|
|
|
ββββββββββββββββββββββββββββ |
|
β _______ β |
|
β /\ |_ _| β |
|
β __ __ / \ | | β |
|
β \ \/ / / /\ \ | | β |
|
β > < / ____ \ _| |_ β |
|
β /_/\_\/_/ \_\_____| β |
|
β β |
|
β Understand the Universe β |
|
β [https://x.ai] β |
|
ββββββββββββββββββββββββββββ |
|
βββββββββββββββββββββ |
|
β xAI Grok-1 (314B) β |
|
βββββββββββββββββββββ |
|
ββββββββββββββββββββββββββββββββββββββββββββββ |
|
β 314B parameter Mixture of Experts model β |
|
β - Base model (not finetuned) β |
|
β - 8 experts (2 active) β |
|
β - 86B active parameters β |
|
β - Apache 2.0 license β |
|
β - Code: https://github.com/xai-org/grok-1 β |
|
β - Happy coding! β |
|
ββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
|
## Model Configuration Details |
|
|
|
**Vocabulary Size**: 131,072 |
|
|
|
**Special Tokens**: |
|
- Pad Token: 0 |
|
- End of Sequence Token: 2 |
|
|
|
**Sequence Length**: 8192 |
|
|
|
### **Model Architecture**: MoE |
|
- **Embedding Size**: 6,144 |
|
- Rotary Embedding (RoPE) |
|
- **Layers**: 64 |
|
- **Experts**: 8 |
|
- **Selected Experts**: 2 |
|
- **Widening Factor**: 8 |
|
- **Key Size**: 128 |
|
- **Query Heads**: 48 |
|
- **Key Value Heads**: 8 |
|
- **Activation Sharding**: Data-wise, Model-wise |
|
- **Tokenizer** : SentencePiece tokenizer |
|
|
|
### **Inference Configuration**: |
|
- Batch Size per Device: 0.125 |
|
- Tokenizer: `./tokenizer.model` |
|
- Local Mesh: 1x8 |
|
- Between Hosts: 1x1 |
|
|
|
|
|
## Inference Details |
|
|
|
Make sure to download the `int8` checkpoint to the `checkpoints` directory and run |
|
|
|
```shell |
|
pip install -r requirements.txt |
|
python transformer.py |
|
``` |
|
|
|
to test the code. |
|
|
|
You should be seeing output from the language model. |
|
|
|
Due to the large size of the model (314B parameters), a multi-GPU machine is required to test the model with the example code. |
|
|
|
**p.s. we're hiring: https://x.ai/careers** |