Text Generation
Transformers
Safetensors
English
falcon_mamba
Eval Results
Inference Endpoints
yellowvm commited on
Commit
3adb85d
1 Parent(s): 2b10f59

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -198,8 +198,8 @@ We evaluate our model on all benchmarks of the new leaderboard's version using t
198
  | `model name` |`IFEval`| `BBH` |`MATH LvL5`| `GPQA`| `MUSR`|`MMLU-PRO`|`Average`|
199
  |:--------------------------|:------:|:-----:|:---------:|:-----:|:-----:|:--------:|:-------:|
200
  | ***Pure SSM models*** | | | | | | | |
201
- | `FalconMamba-7B` | 33.36 | 19.88 | 3.63 | 8.05 | 10.86 | 14.47 |**15.04**|
202
- | `TRI-ML/mamba-7b-rw` | 22.46 | 6.71 | 0.45 | 1.12 | 5.51 | 1.69 | 6.25 |
203
  |***Hybrid SSM-attention models*** | | | | | | |
204
  |`recurrentgemma-9b` | 30.76 | 14.80 | 4.83 | 4.70 | 6.60 | 17.88 | 13.20 |
205
  | `Zyphra/Zamba-7B-v1` | 24.06 | 21.12 | 3.32 | 3.03 | 7.74 | 16.02 | 12.55 |
@@ -229,6 +229,8 @@ Also, we evaluate our model on the benchmarks of the first leaderboard using `li
229
  | `Mistral-7B-v0.1` | 59.98 | 83.31 | 64.16 | 78.37 | 42.15 | 37.83 | 60.97 |
230
  | `gemma-7B` | 61.09 | 82.20 | 64.56 | 79.01 | 44.79 | 50.87 | 63.75 |
231
 
 
 
232
  ## Throughput
233
 
234
  This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands:
 
198
  | `model name` |`IFEval`| `BBH` |`MATH LvL5`| `GPQA`| `MUSR`|`MMLU-PRO`|`Average`|
199
  |:--------------------------|:------:|:-----:|:---------:|:-----:|:-----:|:--------:|:-------:|
200
  | ***Pure SSM models*** | | | | | | | |
201
+ | `FalconMamba-7B` | 33.36 | 19.88 | 3.63 | 8.05 | 10.86 | 14.47 |**15.04**|
202
+ | `TRI-ML/mamba-7b-rw`<sup>*</sup>| 22.46 | 6.71 | 0.45 | 1.12 | 5.51 | 1.69 | 6.25 |
203
  |***Hybrid SSM-attention models*** | | | | | | |
204
  |`recurrentgemma-9b` | 30.76 | 14.80 | 4.83 | 4.70 | 6.60 | 17.88 | 13.20 |
205
  | `Zyphra/Zamba-7B-v1` | 24.06 | 21.12 | 3.32 | 3.03 | 7.74 | 16.02 | 12.55 |
 
229
  | `Mistral-7B-v0.1` | 59.98 | 83.31 | 64.16 | 78.37 | 42.15 | 37.83 | 60.97 |
230
  | `gemma-7B` | 61.09 | 82.20 | 64.56 | 79.01 | 44.79 | 50.87 | 63.75 |
231
 
232
+ The evaluation results were borrowed from both leaderboards. For the models with no leaderboard results (marked by *star*), we evalueated the tasks internally.
233
+
234
  ## Throughput
235
 
236
  This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands: