pglo commited on
Commit
afe204f
1 Parent(s): 0e5139e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -1,3 +1,38 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Model Card for Zamba
5
+
6
+ Zamba-7B-v1 is a hybrid between state-space models (Specifically Mamba) and transformer, and was trained using next-token prediction. Zamba uses a shared transformer layer after every 6 mamba blocks. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1 was pre-trained on 1T tokens of text and code data, and subsequently, in a second phase, on a mixture of 50B high-quality tokens.
7
+
8
+ ## Quick start
9
+
10
+ ### Presequities
11
+
12
+ Zamba requires you use `transformers` version 4.39.0 or higher:
13
+ ```bash
14
+ pip install transformers>=4.39.0
15
+ ```
16
+
17
+ In order to run optimized Mamba implementations, you first need to install `mamba-ssm` and `causal-conv1d`:
18
+ ```bash
19
+ pip install mamba-ssm causal-conv1d>=1.2.0
20
+ ```
21
+ You also have to have the model on a CUDA device.
22
+
23
+ You can run the model not using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency. In order to do that, you'll need to specify `use_mamba_kernels=False` when loading the model.
24
+
25
+ ## Inference
26
+
27
+ ```python
28
+ from transformers import AutoTokenizer, AutoModelForCausalLM
29
+ import torch
30
+
31
+ tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba-7B-v1")
32
+ model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-7B-v1", device_map="auto", torch_dtype=torch.bfloat16)
33
+
34
+ input_text = "A funny prompt would be "
35
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
36
+
37
+ outputs = model.generate(**input_ids, max_new_tokens=100)
38
+ print(tokenizer.decode(outputs[0]))