tiiuae
/

falcon-mamba-7b-pre-decay

Safetensors

English

falcon_mamba

Model card Files Files and versions Community

ybelkada commited on Oct 7

Commit

e36f8ee

•

1 Parent(s): 8ea5bea

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -48

README.md CHANGED Viewed

@@ -194,54 +194,6 @@ The model training took roughly two months.
 # Evaluation
-## Benchmarks
-We evaluate our model on all benchmarks of the new leaderboard's version using the `lm-evaluation-harness` package, and then normalize the evaluation results with HuggingFace score normalization.
-| `model name`              |`IFEval`| `BBH` |`MATH LvL5`| `GPQA`| `MUSR`|`MMLU-PRO`|`Average`|
-|:--------------------------|:------:|:-----:|:---------:|:-----:|:-----:|:--------:|:-------:|
-| ***Pure SSM models***     |        |       |           |       |       |          |         |
-| `FalconMamba-7B`          |  33.36 | 19.88 |    3.63   |8.05   |10.86  | 14.47    |**15.04**|
-| `TRI-ML/mamba-7b-rw`<sup>*</sup>| 22.46  | 6.71  | 0.45      | 1.12  | 5.51  | 1.69     | 6.25    |
-|***Hybrid SSM-attention models***   |       |           |       |       |          |         |
-|`recurrentgemma-9b`        | 30.76  | 14.80 | 4.83      | 4.70  | 6.60  | 17.88    |  13.20  |
-| `Zyphra/Zamba-7B-v1`<sup>*</sup>      | 24.06  | 21.12 | 3.32      | 3.03  | 7.74  | 16.02    | 12.55   |
-|***Transformer models***   |        |       |           |       |       |          |         |
-| `Falcon2-11B`             | 32.61  | 21.94 |    2.34   | 2.80  | 7.53  | 15.44    |  13.78  |
-| `Meta-Llama-3-8B`         | 14.55  | 24.50 |    3.25   | 7.38  | 6.24  | 24.55    |  13.41  |
-| `Meta-Llama-3.1-8B`       | 12.70  | 25.29 |    4.61   | 6.15  | 8.98  | 24.95    |  13.78  |
-| `Mistral-7B-v0.1`         | 23.86  | 22.02 |    2.49   | 5.59  | 10.68 | 22.36    |  14.50  |
-| `Mistral-Nemo-Base-2407 (12B)`       | 16.83  | 29.37 |    4.98   | 5.82  | 6.52  | 27.46    |  15.08  |
-| `gemma-7B`                | 26.59  | 21.12 |    6.42   | 4.92  | 10.98 | 21.64    |**15.28**|
-|***RWKV models***          |        |       |           |       |       |          |         |
-| `RWKV-v6-Finch-7B`<sup>*</sup>          | 27.65  | 9.04 |    1.11   | 2.81  | 2.25  | 5.85    |  8.12  |
-| `RWKV-v6-Finch-14B`<sup>*</sup>         | 29.81  | 12.89 |    1.13   | 5.01  | 3.16  | 11.3    |  10.55  |
-Also, we evaluate our model on the benchmarks of the first leaderboard using `lighteval`.
-| `model name`                 |`ARC`|`HellaSwag`   |`MMLU` |`Winogrande`|`TruthfulQA`|`GSM8K`|`Average`         |
-|:-----------------------------|:------:|:---------:|:-----:|:----------:|:----------:|:-----:|:----------------:|
-| ***Pure SSM models***        |        |           |       |            |            |       |                  |
-| `FalconMamba-7B`<sup>*</sup>          | 62.03 |   80.82   | 62.11 |   73.64    |  53.42  | 52.54 |  **64.09**       |
-| `TRI-ML/mamba-7b-rw`<sup>*</sup>         | 51.25  | 80.85     | 33.41 | 71.11      | 32.08      | 4.70  | 45.52            |
-|***Hybrid SSM-attention models***|     |           |       |            |            |       |                  |
-| `recurrentgemma-9b`<sup>**</sup>          |52.00   |   80.40   | 60.50 |   73.60    |   38.60    | 42.60 |  57.95           |
-| `Zyphra/Zamba-7B-v1`<sup>*</sup>         | 56.14  | 82.23     | 58.11 | 79.87      | 52.88      | 30.78 |  60.00           |
-|***Transformer models***      |        |           |       |            |            |       |                  |
-| `Falcon2-11B`                | 59.73  | 82.91     | 58.37 | 78.30      | 52.56      | 53.83 | **64.28**        |
-| `Meta-Llama-3-8B`            | 60.24  | 82.23     | 66.70 | 78.45      | 42.93      | 45.19 | 62.62            |
-| `Meta-Llama-3.1-8B`            | 58.53  | 82.13     | 66.43 | 74.35      | 44.29      | 47.92 | 62.28            |
-| `Mistral-7B-v0.1`            | 59.98  | 83.31     | 64.16 | 78.37      | 42.15      | 37.83 | 60.97            |
-| `Mistral-Nemo-Base-2407 (12B)`<sup>*</sup>       | 57.94  | 82.82 |    64.43   | 73.72  | 49.14  | 55.27    |  63.89  |
-| `gemma-7B`                   | 61.09  |   82.20   | 64.56 |   79.01    |   44.79    | 50.87 |  63.75           |
-|***RWKV models***             |        |       |           |       |       |          |         |
-| `RWKV-v6-Finch-7B`<sup>*</sup>          | 43.86  | 75.19 |    41.69   | 68.27  | 42.19  | 19.64    |  48.47  |
-| `RWKV-v6-Finch-14B`<sup>*</sup>         | 47.44  | 78.86 |    52.33   | 71.27  | 45.45  | 38.06    |  55.57  |
-Mostly, we took evaluation results from both leaderboards. For the models marked by *star* we evaluated the tasks internally, while for the models marked by two *stars* the results were taken from paper or model card.
 ## Throughput
 This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands:

 # Evaluation
 ## Throughput
 This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands: