Update README.md
Browse files
README.md
CHANGED
@@ -198,8 +198,8 @@ We evaluate our model on all benchmarks of the new leaderboard's version using t
|
|
198 |
| `model name` |`IFEval`| `BBH` |`MATH LvL5`| `GPQA`| `MUSR`|`MMLU-PRO`|`Average`|
|
199 |
|:--------------------------|:------:|:-----:|:---------:|:-----:|:-----:|:--------:|:-------:|
|
200 |
| ***Pure SSM models*** | | | | | | | |
|
201 |
-
| `FalconMamba-7B`
|
202 |
-
| `TRI-ML/mamba-7b-rw
|
203 |
|***Hybrid SSM-attention models*** | | | | | | |
|
204 |
|`recurrentgemma-9b` | 30.76 | 14.80 | 4.83 | 4.70 | 6.60 | 17.88 | 13.20 |
|
205 |
| `Zyphra/Zamba-7B-v1` | 24.06 | 21.12 | 3.32 | 3.03 | 7.74 | 16.02 | 12.55 |
|
@@ -229,6 +229,8 @@ Also, we evaluate our model on the benchmarks of the first leaderboard using `li
|
|
229 |
| `Mistral-7B-v0.1` | 59.98 | 83.31 | 64.16 | 78.37 | 42.15 | 37.83 | 60.97 |
|
230 |
| `gemma-7B` | 61.09 | 82.20 | 64.56 | 79.01 | 44.79 | 50.87 | 63.75 |
|
231 |
|
|
|
|
|
232 |
## Throughput
|
233 |
|
234 |
This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands:
|
|
|
198 |
| `model name` |`IFEval`| `BBH` |`MATH LvL5`| `GPQA`| `MUSR`|`MMLU-PRO`|`Average`|
|
199 |
|:--------------------------|:------:|:-----:|:---------:|:-----:|:-----:|:--------:|:-------:|
|
200 |
| ***Pure SSM models*** | | | | | | | |
|
201 |
+
| `FalconMamba-7B` | 33.36 | 19.88 | 3.63 | 8.05 | 10.86 | 14.47 |**15.04**|
|
202 |
+
| `TRI-ML/mamba-7b-rw`<sup>*</sup>| 22.46 | 6.71 | 0.45 | 1.12 | 5.51 | 1.69 | 6.25 |
|
203 |
|***Hybrid SSM-attention models*** | | | | | | |
|
204 |
|`recurrentgemma-9b` | 30.76 | 14.80 | 4.83 | 4.70 | 6.60 | 17.88 | 13.20 |
|
205 |
| `Zyphra/Zamba-7B-v1` | 24.06 | 21.12 | 3.32 | 3.03 | 7.74 | 16.02 | 12.55 |
|
|
|
229 |
| `Mistral-7B-v0.1` | 59.98 | 83.31 | 64.16 | 78.37 | 42.15 | 37.83 | 60.97 |
|
230 |
| `gemma-7B` | 61.09 | 82.20 | 64.56 | 79.01 | 44.79 | 50.87 | 63.75 |
|
231 |
|
232 |
+
The evaluation results were borrowed from both leaderboards. For the models with no leaderboard results (marked by *star*), we evalueated the tasks internally.
|
233 |
+
|
234 |
## Throughput
|
235 |
|
236 |
This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands:
|