mambaoutai / README.md
clement-lighton-sa's picture
Update README.md
d0c7733 verified
|
raw
history blame
3.35 kB
metadata
license: apache-2.0
datasets:
  - togethercomputer/RedPajama-Data-V2
  - stingning/ultrachat
language:
  - fr
  - en

Mambaoutai 1.6B

Mambaoutai is the result of all the experiments and training runs described in the following blog post, where all details about the model series is shared. Mambaoutai is series of small mamba checkpoints released for the community to explore, trained on French, English and code. We run two different decay phases with the WSD-scheduler, and release model checkpoints pretrained both with and without instruction data.

Usage

You need to install transformers from main until transformers=4.39.0 is released.

pip install git+https://github.com/huggingface/transformers@main

We also recommend you to install both causal_conv_1d and mamba-ssm using:

pip install causal-conv1d>=1.2.0
pip install mamba-ssm

If any of these two is not installed, the "eager" implementation will be used. Otherwise the more optimised cuda kernels will be used.

Generation

Use this snippet of code to generate text from the model:

from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch

if model_has_instruct_data:
    # use chat tokens
    prompt = ”<start_user>Tell me something about Paris.<end_message><start_assistant>”
else:
    # prompt the non-instructed tuned model gently
    prompt = ”This is a text about Paris. Paris is”

tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai")
input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

Training checkpoints

You can find some of the training checkpoints in the repo branch. On branch corresponding to the model at some point in time during training.

You can do inference with these training checkpoints by adding the revision parameter to the from_pretrained method. For example, to load the model checkpoint after 30000 steps of pretraining, you can use the following code:

from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
model = MambaForCausalLM.from_pretrained("lightonai/mambaoutai", revision="pre-30000")
input_ids = tokenizer("What is a mamba?", return_tensors="pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
print(tokenizer.batch_decode(out))

Model hyperparameters

More details about the model hyperparameters are given in the table below :

Parameter Value
d_model 2688
n_layer 28
vocab_size 65024
context_len 4096
rms_norm true
residual_in_fp32 true
fused_add_norm true
conv_kernel 4
d_inner 5376
state_size 16
dtype bfloat16
tie_word_embeddings false
non embeddings params 1.27B