Update README.md
Browse files
README.md
CHANGED
@@ -238,10 +238,10 @@ The model is based on the Mamba architecture ([Gu et al., 2023](https://arxiv.or
|
|
238 |
|
239 |
| **Hyperparameter** | **Value** | **Comment** |
|
240 |
|--------------------|-----------|----------------------------------------|
|
241 |
-
| Layers | 64 |
|
242 |
-
| `d_model` | 4096 |
|
243 |
| `d_state` | 16 | The SSM state dimension |
|
244 |
-
| Vocabulary | 65024 |
|
245 |
| Sequence length | 8192 | During stages 4 and LR Decay stage |
|
246 |
|
247 |
## Compute Infrastructure
|
|
|
238 |
|
239 |
| **Hyperparameter** | **Value** | **Comment** |
|
240 |
|--------------------|-----------|----------------------------------------|
|
241 |
+
| Layers | 64 | Number of layers |
|
242 |
+
| `d_model` | 4096 | Hidden dimension |
|
243 |
| `d_state` | 16 | The SSM state dimension |
|
244 |
+
| Vocabulary | 65024 | Vocabulary Size |
|
245 |
| Sequence length | 8192 | During stages 4 and LR Decay stage |
|
246 |
|
247 |
## Compute Infrastructure
|