---
license: apache-2.0
datasets:
- tiiuae/falcon-refinedweb
pipeline_tag: text-generation
library_name: openlm
tags:
- mamba
- linear
language:
- en
model-index:
- name: mamba-7b
results:
- task:
type: text-generation
dataset:
type: MMLU
name: MMLU
metrics:
- name: accuracy
type: accuracy
value: 33.3
verified: false
- task:
type: text-generation
dataset:
type: HellaSwag
name: HellaSwag
metrics:
- name: accuracy
type: accuracy
value: 77.9
verified: false
- task:
type: text-generation
dataset:
type: PIQA
name: PIQA
metrics:
- name: accuracy
type: accuracy
value: 81.0
verified: false
- task:
type: text-generation
dataset:
type: Winogrande
name: Winogrande
metrics:
- name: accuracy
type: accuracy
value: 71.8
verified: false
- task:
type: text-generation
dataset:
type: ai2_arc
name: ARC-E
metrics:
- name: accuracy
type: accuracy
value: 77.5
verified: false
- task:
type: text-generation
dataset:
type: ai2_arc
name: ARC-C
metrics:
- name: accuracy
type: accuracy
value: 46.7
verified: false
---
# Mamba-7B
This is a 7B parameter model with the [Mamba](https://arxiv.org/abs/2312.00752) architecture, trained on multiple epochs (1.2T tokens) of the [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) dataset.
Mamba is a state-space model that does not use self-attention unlike the standard transformer architecture. It has shown strong performance on various natural language benchmarks. To date, the largest publicly released pure-Mamba pretrain is [Mamba-2.8B](https://huggingface.co/state-spaces/mamba-2.8b).
We follow their training recipe and release our version of Mamba-7B.
## Model Details
- **Developed by**: [Toyota Research Institute](https://www.tri.global/our-work/robotics)
- **Model Type**: This is an auto-regressive language model based on the [Mamba](https://arxiv.org/abs/2312.00752) architecture.
- **Dataset**: Trained on 1.2T tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb)
- **Tokenizer**: `EleutherAI/gpt-neox-20b`
- **Library**: [OpenLM](https://github.com/mlfoundations/open_lm/)
- **License**: This model is licensed under [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
| Parameters | Hidden Size | Layers | Vocab Size | Sequence Length |
|------------|-------------|--------| ---------- | --------------- |
| 7B | 4096 | 64 | 50432 | 2048 |
## Training Details
- Mamba-7B was trained using AWS SageMaker on 128 H100 80GB GPUs.
- Training began in March 2024 and lasted three weeks.
| **Hyperparameter** | **Value** |
|--------------------|------------|
| Precision | `bfloat16` |
| Optimizer | AdamW |
| Learning rate | 3e-4 |
| LR cooldown end | 1e-5 |
| QK-norm | False |
| Warmup steps | 2000 |
| Z-loss | 1e-4 |
| Batch size | 2M |
## Usage
This model was trained using [OpenLM](https://github.com/mlfoundations/open_lm/). The weights have been converted to be compatible with HuggingFace.
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tri-ml/mamba-7b-rw")
model = AutoModelForCausalLM.from_pretrained("tri-ml/mamba-7b-rw").cuda()
inputs = tokenizer(["A beautiful flower"], return_tensors="pt")
gen_kwargs = {"max_length": 128, "top_p": 0.8, "temperature": 0.8, "do_sample": True, "repetition_penalty": 1.1}
output = model.generate(inputs['input_ids'], **gen_kwargs)
output = tokenizer.decode(output[0].tolist(), skip_special_tokens=True)
print(output)
# A beautiful flower box made of white rose wood. It is a perfect gift for weddings, birthdays and anniversaries.
# All the roses are from our farm Roses Flanders. Therefor you know that these flowers last much longer than those in store or online!
```
## Performance Evaluation
Our evaluations were done using the [Eleuther LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) repo.
Below we report the performance of Mamba 7B compared to other base models.