BerenMillidge commited on
Commit
88bc14a
1 Parent(s): 07bf624

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -7
README.md CHANGED
@@ -22,7 +22,6 @@ You can run the model without using the optimized Mamba kernels, but it is **not
22
 
23
  To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
24
 
25
-
26
  ### Inference
27
 
28
  ```python
@@ -46,7 +45,6 @@ outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generat
46
  print((tokenizer.decode(outputs[0])))
47
  ```
48
 
49
-
50
  ## Performance
51
 
52
  Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely strong instruction-following benchmark scores, significantly outperforming Gemma2-2B-Instruct of the same size and outperforming Mistral-7B-Instruct in most metrics.
@@ -54,11 +52,6 @@ Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely
54
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64e40335c0edca443ef8af3e/wXFMLXZA2-xz2PDyUMwTI.png" width="600"/>
55
 
56
  Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
57
- <center>
58
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/U7VD9PYLj3XcEjgV08sP5.png" width="700" alt="Zamba performance">
59
- </center>
60
-
61
-
62
 
63
  Time to First Token (TTFT) | Output Generation
64
  :-------------------------:|:-------------------------:
 
22
 
23
  To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
24
 
 
25
  ### Inference
26
 
27
  ```python
 
45
  print((tokenizer.decode(outputs[0])))
46
  ```
47
 
 
48
  ## Performance
49
 
50
  Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely strong instruction-following benchmark scores, significantly outperforming Gemma2-2B-Instruct of the same size and outperforming Mistral-7B-Instruct in most metrics.
 
52
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64e40335c0edca443ef8af3e/wXFMLXZA2-xz2PDyUMwTI.png" width="600"/>
53
 
54
  Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
 
 
 
 
 
55
 
56
  Time to First Token (TTFT) | Output Generation
57
  :-------------------------:|:-------------------------: