File size: 3,001 Bytes
212c94f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
459c60b
 
 
 
212c94f
459c60b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
212c94f
459c60b
 
 
 
 
 
 
 
 
212c94f
459c60b
 
 
 
212c94f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
tags:
  - Llamba
  - recurrent-models
  - distillation
  - cartesia
  - edge
license: apache-2.0
library_name: cartesia-pytorch
datasets:
  - ai2_arc
  - PIQA
  - Winogrande
  - HellaSwag
  - Lambada
  - MMLU
  - OpenBookQA
inference: 
  precision: bf16
  hardware: gpu
---

# Llamba Models

The Llamba models are part of Cartesia's [Edge](https://github.com/cartesia-ai/edge) library, designed for efficient, high-performance machine learning applications.

For more details, refer to the [paper](https://arxiv.org/abs/2502.14458).

---
## Usage

### Llamba on PyTorch

To use Llamba with PyTorch:

1. Install the required package:
 ```bash
 pip install --no-binary :all: cartesia-pytorch
 ```
2. Load and run the model
```python
from transformers import AutoTokenizer
from cartesia_pytorch.Llamba.llamba import LlambaLMHeadModel

model = LlambaLMHeadModel.from_pretrained("cartesia-ai/Llamba-1B", strict=True).to('cuda')
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
input_ids = tokenizer("Hello, my name is", return_tensors="pt").input_ids
input_ids = input_ids.to('cuda')
output = model.generate(input_ids, max_length=100)[0]
print(tokenizer.decode(output, skip_special_tokens=True))
```

### Llamba on MLX

To run Llamba with the Metal framework see [cartesia-metal](https://github.com/cartesia-ai/edge/tree/main/cartesia-metal)

---
### Evaluations

The Llamba models have been evaluated on multiple standard benchmarks, demonstrating efficiency gains while maintaining strong performance. Below are the results:

| Model      | ARC-C (0-shot) | ARC-C (25-shot) | ARC-E (0-shot) | ARC-E (25-shot) | PIQA (0-shot) | PIQA (10-shot) | WG (0-shot) | WG (5-shot) |
|------------|---------------|----------------|---------------|----------------|---------------|---------------|------------|------------|
| Llamba-1B  | 37.2          | 41.8           | 69.5          | 71.2           | 74.0          | 74.3          | 60.6       | 58.1       |
| Llamba-3B  | 48.5          | 53.0           | 79.0          | 81.1           | 78.6          | 79.5          | 70.4       | 72.4       |
| Llamba-8B  | 54.6          | 60.0           | 82.5          | 85.8           | 80.9          | 81.5          | 73.3       | 76.9       |

| Model      | HS (0-shot) | HS (10-shot) | LMB (0-shot) | LMB (10-shot) | MMLU (0-shot) | MMLU (5-shot) | OBQA (0-shot) | OBQA (10-shot) |
|------------|------------|------------|------------|------------|------------|------------|------------|------------|
| Llamba-1B  | 61.2       | 60.2       | 48.4       | 39.0       | 38.0       | 31.3       | 37.0       | 38.0       |
| Llamba-3B  | 73.8       | 74.3       | 65.8       | 60.0       | 52.7       | 50.3       | 42.8       | 42.8       |
| Llamba-8B  | 77.6       | 78.7       | 69.4       | 65.0       | 61.0       | 60.0       | 43.4       | 45.8       |

More details on model performance, benchmarks, and evaluation metrics can be found in the [paper](https://arxiv.org/abs/2502.14458).