File size: 729 Bytes
476ca24
 
 
be19c03
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
---
license: apache-2.0
---

Mamba Bit!

Mamba with vocab size 2 bites again! This time we bite at tiny stories.
I didn't bother preprocess them at all, during a training model took random char offset, converted it to bit string and fed to mamba. This time I didn't forget about residual connections nor about norm. As the result model was trained in BF16.

Training code included. 

Example to run a model from CLI:

$ python mambabit.py "Run, kitten, run"

Run, kitten, running and jumping. She saw a big tree and thought it would be fun to share the tree. So, she went to the tree and started to climb the tree. She saw a big tree and thought it would be fun to share the tree. So, she went to the tree and saw a big red ball.