File size: 4,609 Bytes
2133956 7aacc36 3721b72 2133956 7aacc36 2133956 286ff8d 7aacc36 2133956 3721b72 2133956 2a16a50 2133956 287613f 2133956 7aacc36 2133956 7aacc36 2133956 7aacc36 2133956 7aacc36 2133956 3721b72 2133956 02d8365 7aacc36 0d98ca1 287613f 0d98ca1 05cb700 9deb420 05cb700 7aacc36 02d8365 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
license: apache-2.0
datasets:
- togethercomputer/RedPajama-Data-V2
- stingning/ultrachat
language:
- fr
- en
metrics:
- accuracy
- perplexity
---
# Mambaoutai 1.6B
Mambaoutai is the result of all the experiments and training runs described in the [following blog post](https://www.lighton.ai/fr/blog/blog-4/passing-the-torch-training-a-mamba-model-for-smooth-handover-54), where all details about the model series is shared. Mambaoutai is series of small mamba checkpoints released for the community to explore, trained on French, English and code. We run two different decay phases with the WSD-scheduler, and release model checkpoints pretrained both with and without instruction data.
## Usage
You need to install `transformers` from `main` until `transformers=4.39.0` is released.
```bash
pip install git+https://github.com/huggingface/transformers@main
```
We also recommend you to install both `causal-conv1d` and `mamba-ssm` using:
```bash
pip install causal-conv1d>=1.2.0
pip install mamba-ssm>=1.2.0
```
If any of these two is not installed, the "eager" implementation will be used(not recommended). Otherwise the more optimised `CUDA` kernels will be used.
### Generation
Use this snippet of code to generate text from the model:
```python
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch
if model_has_instruct_data:
# use chat tokens
prompt = ”<start_user>Tell me something about Paris.<end_message><start_assistant>”
else:
# prompt the non-instructed tuned model gently
prompt = ”This is a text about Paris. Paris is”
tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai")
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"]
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
```
### Training checkpoints
You can find some of the training checkpoints in the repo branch. On branch corresponding to the model at some point in time during training.
You can do inference with these training checkpoints by adding the `revision` parameter to the `from_pretrained` method.
For example, to load the model checkpoint after 30000 steps of pretraining, you can use the following code:
```python
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
input_ids = tokenizer("What is a mamba?", return_tensors="pt")["input_ids"]
out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))
```
### On-device Inference
Since Mambaoutai is only 1.6B parameters, it can be run on a CPU with reasonable speed.
Here is an example of how to run it on llama.cpp:
```bash
# Clone llama.cpp repository and compile it from source
git clone https://github.com/ggerganov/llama.cpp\
cd llama.cpp
make
# Create a venv and install dependencies
conda create -n mamba-cpp python=3.10
conda activate mamba-cpp
pip install -r requirements/requirements-convert-hf-to-gguf.txt
# Download the weights, tokenizer, config, tokenizer_config and special_tokens_map from this repo and
# put them in a directory 'Mambaoutai/'
mkdir Mambaoutai
# Convert the weights to GGUF format
python convert-hf-to-gguf.py Mambaoutai
# Run inference with a prompt
./main -m Mambaoutai/ggml-model-f16.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -e -ngl 1
```
### Training Hardware
The model checkpoints with no instruction data have been fully trained on an NVIDIA DGX H100 provided by OVH Cloud, whereas the decay phases with instruction data have been carried out on an HPE Cray with 8xH100 on Orange Cloud Avenue.
The ablation experiments were conducted on 16 nodes(4xA100-40GB) on MeluXina.
### Model hyperparameters
More details about the model hyperparameters are given in the table below :
| Parameter | Value |
|-----------------------|----------|
| d_model | 2688 |
| n_layer | 28 |
| vocab_size | 65024 |
| context_len | 4096 |
| rms_norm | true |
| residual_in_fp32 | true |
| fused_add_norm | true |
| conv_kernel | 4 |
| d_inner | 5376 |
| state_size | 16 |
| dtype | bfloat16 |
| tie_word_embeddings | false |
| non embeddings params | 1.27B | |