Text Generation
Transformers
Safetensors
English
falcon_mamba
Eval Results
Inference Endpoints
ybelkada commited on
Commit
a29aacd
·
verified ·
1 Parent(s): 18641ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -238,10 +238,10 @@ The model is based on the Mamba architecture ([Gu et al., 2023](https://arxiv.or
238
 
239
  | **Hyperparameter** | **Value** | **Comment** |
240
  |--------------------|-----------|----------------------------------------|
241
- | Layers | 64 | |
242
- | `d_model` | 4096 | |
243
  | `d_state` | 16 | The SSM state dimension |
244
- | Vocabulary | 65024 | |
245
  | Sequence length | 8192 | During stages 4 and LR Decay stage |
246
 
247
  ## Compute Infrastructure
 
238
 
239
  | **Hyperparameter** | **Value** | **Comment** |
240
  |--------------------|-----------|----------------------------------------|
241
+ | Layers | 64 | Number of layers |
242
+ | `d_model` | 4096 | Hidden dimension |
243
  | `d_state` | 16 | The SSM state dimension |
244
+ | Vocabulary | 65024 | Vocabulary Size |
245
  | Sequence length | 8192 | During stages 4 and LR Decay stage |
246
 
247
  ## Compute Infrastructure