BerenMillidge
commited on
Commit
•
88bc14a
1
Parent(s):
07bf624
Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,6 @@ You can run the model without using the optimized Mamba kernels, but it is **not
|
|
22 |
|
23 |
To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
|
24 |
|
25 |
-
|
26 |
### Inference
|
27 |
|
28 |
```python
|
@@ -46,7 +45,6 @@ outputs = model.generate(**input_ids, max_new_tokens=150, return_dict_in_generat
|
|
46 |
print((tokenizer.decode(outputs[0])))
|
47 |
```
|
48 |
|
49 |
-
|
50 |
## Performance
|
51 |
|
52 |
Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely strong instruction-following benchmark scores, significantly outperforming Gemma2-2B-Instruct of the same size and outperforming Mistral-7B-Instruct in most metrics.
|
@@ -54,11 +52,6 @@ Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely
|
|
54 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64e40335c0edca443ef8af3e/wXFMLXZA2-xz2PDyUMwTI.png" width="600"/>
|
55 |
|
56 |
Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
|
57 |
-
<center>
|
58 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/U7VD9PYLj3XcEjgV08sP5.png" width="700" alt="Zamba performance">
|
59 |
-
</center>
|
60 |
-
|
61 |
-
|
62 |
|
63 |
Time to First Token (TTFT) | Output Generation
|
64 |
:-------------------------:|:-------------------------:
|
|
|
22 |
|
23 |
To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
|
24 |
|
|
|
25 |
### Inference
|
26 |
|
27 |
```python
|
|
|
45 |
print((tokenizer.decode(outputs[0])))
|
46 |
```
|
47 |
|
|
|
48 |
## Performance
|
49 |
|
50 |
Zamba2-2.7B-Instruct punches dramatically above its weight, achieving extremely strong instruction-following benchmark scores, significantly outperforming Gemma2-2B-Instruct of the same size and outperforming Mistral-7B-Instruct in most metrics.
|
|
|
52 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64e40335c0edca443ef8af3e/wXFMLXZA2-xz2PDyUMwTI.png" width="600"/>
|
53 |
|
54 |
Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
|
|
|
|
|
|
|
|
|
|
|
55 |
|
56 |
Time to First Token (TTFT) | Output Generation
|
57 |
:-------------------------:|:-------------------------:
|