BerenMillidge commited on
Commit
630c127
1 Parent(s): 1970b61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -1,9 +1,9 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- # Model Card for Zamba
5
 
6
- Zamba-7B-v1 is a hybrid between state-space models (Specifically Mamba) and transformer, and was trained using next-token prediction. Zamba uses a shared transformer layer after every 6 mamba blocks. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1 was pre-trained on 1T tokens of text and code data, and subsequently, in a second phase, on a mixture of 50B high-quality tokens.
7
 
8
  ## Quick start
9
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Model Card for Zamba 7B
5
 
6
+ Zamba-7B-v1 is a hybrid model between Mamba, a state-space model, and transformers. It uses a mamba backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1 was pre-trained on 1T tokens of text and code data sourced from open web-datasets. Subsequently in a second phase, Zamba was annealed on a mixture of 50B high-quality tokens.
7
 
8
  ## Quick start
9