pglo commited on
Commit
1e05475
1 Parent(s): 5dd2d3c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -45,6 +45,14 @@ outputs = model.generate(**input_ids, max_new_tokens=100)
45
  print(tokenizer.decode(outputs[0]))
46
  ```
47
 
 
 
 
 
 
 
 
 
48
  ## Model Details
49
 
50
  Zamba utilizes a unique hybrid SSM architecture. This architecture consists of a backbone of Mamba layers interspersed with a shared attention layer. This attention has shared weights to minimize the parameter cost of the model. We find that concatenating the original model embeddings to the input to this attention block improves performance, likely due to better maintenance of information across depth.
 
45
  print(tokenizer.decode(outputs[0]))
46
  ```
47
 
48
+ To load a different checkpoint use, e.g., for iteration 2500,
49
+
50
+ ```python
51
+ model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-7B-v1", device_map="auto", torch_dtype=torch.bfloat16, revision="iter2500")
52
+ ```
53
+
54
+ The default iteration is the fully trained model, corresponding to iteration 25156. This is the number of training iterations done starting from Zamba-phase 1 [Zyphra/Zamba-7B-v1-phase1](https://huggingface.co/Zyphra/Zamba-7B-v1-phase1). See [arXiv:2405.16712](https://arxiv.org/abs/2405.16712) for more details on training.
55
+
56
  ## Model Details
57
 
58
  Zamba utilizes a unique hybrid SSM architecture. This architecture consists of a backbone of Mamba layers interspersed with a shared attention layer. This attention has shared weights to minimize the parameter cost of the model. We find that concatenating the original model embeddings to the input to this attention block improves performance, likely due to better maintenance of information across depth.