tiiuae
/

falcon-mamba-7b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

ybelkada commited on Jul 24, 2024

Commit

a29aacd

·

verified ·

1 Parent(s): 18641ad

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -238,10 +238,10 @@ The model is based on the Mamba architecture ([Gu et al., 2023](https://arxiv.or
 | **Hyperparameter** | **Value** | **Comment**                            |
 |--------------------|-----------|----------------------------------------|
-| Layers             | 64        |                                        |
-| `d_model`          | 4096      |                                        |
 | `d_state`          | 16        | The SSM state dimension                |
-| Vocabulary         | 65024     |                                        |
 | Sequence length    | 8192      | During stages 4 and LR Decay stage     |
 ## Compute Infrastructure

 | **Hyperparameter** | **Value** | **Comment**                            |
 |--------------------|-----------|----------------------------------------|
+| Layers             | 64        | Number of layers                       |
+| `d_model`          | 4096      | Hidden dimension                       |
 | `d_state`          | 16        | The SSM state dimension                |
+| Vocabulary         | 65024     | Vocabulary Size                        |
 | Sequence length    | 8192      | During stages 4 and LR Decay stage     |
 ## Compute Infrastructure