JingweiZuo
commited on
Commit
•
8c8f700
1
Parent(s):
89fee4a
Update README.md
Browse files
README.md
CHANGED
@@ -250,7 +250,7 @@ The model is based on the Mamba architecture ([Gu et al., 2023](https://arxiv.or
|
|
250 |
| `d_model` | 4096 | Hidden dimension |
|
251 |
| `d_state` | 16 | The SSM state dimension |
|
252 |
| Vocabulary | 65024 | Vocabulary Size |
|
253 |
-
| Sequence length | 8192 | During
|
254 |
|
255 |
## Compute Infrastructure
|
256 |
|
|
|
250 |
| `d_model` | 4096 | Hidden dimension |
|
251 |
| `d_state` | 16 | The SSM state dimension |
|
252 |
| Vocabulary | 65024 | Vocabulary Size |
|
253 |
+
| Sequence length | 8192 | During the last training stages |
|
254 |
|
255 |
## Compute Infrastructure
|
256 |
|