File size: 3,130 Bytes
adc1dd4
 
 
 
 
e7a4766
adc1dd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
datasets:
- stanfordnlp/imdb
language:
- en
library_name: swarmformer
---
# Model Card for SwarmFormer-Small

SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements.

## Model Details

### Model Description
Compact version of SwarmFormer with:
- Token embedding layer with dropout (0.3)
- Two SwarmFormer layers
- Mean pooling and classification
- Optimized for shorter sequences

- **Developed by**: Jordan Legg, Mikus Sturmanis, Takara.ai
- **Funded by**: Takara.ai
- **Shared by**: Takara.ai
- **Model type**: Hierarchical transformer
- **Language(s)**: English
- **License**: Not specified
- **Finetuned from model**: Trained from scratch

### Model Sources
- **Repository**: https://github.com/takara-ai/SwarmFormer
- **Paper**: Takara.ai Research
- **Demo**: Not available

## Uses

### Direct Use
- Text classification
- Sentiment analysis
- Resource-constrained environments

### Out-of-Scope Use
- Text generation
- Machine translation
- Tasks requiring >256 tokens
- Tasks requiring high precision

## Training Details

### Training Data
- Dataset: IMDB Movie Review
- Size: 50,000 samples
- Augmentation techniques applied

### Training Procedure

#### Model Architecture Details
1. **Token Embedding Layer**:
   ```python
   - Embedding layer (vocab_size → 128)
   - Dropout rate: 0.3
   ```

2. **Local Swarm Aggregator**:
   ```python
   - Input dropout: 0.3
   - Local MLP:
     - Linear(128 → 128)
     - GELU
     - Dropout(0.3)
     - Linear(128 → 128)
   - Gate network with GELU
   ```

3. **Clustering Mechanism**:
   - Cluster size: 8 tokens
   - Mean pooling per cluster

4. **Global Cluster Attention**:
   ```python
   - Q/K/V projections: Linear(128 → 128)
   - Attention dropout: 0.3
   ```

#### Training Hyperparameters
- Embedding dimension: 128
- Number of layers: 2
- Local update steps: 3
- Cluster size: 8
- Sequence length: 256
- Batch size: 96
- Learning rate: 4.76 × 10⁻⁴
- Weight decay: 0.0541
- Dropout: 0.30

## Evaluation

### Results
- Accuracy: 86.20%
- Precision: 83.46%
- Recall: 90.31%
- F1: 86.75%
- Inference time: 0.36s (25k samples)
- Mean batch latency: 3.67ms
- Throughput: 45k samples/s
- Peak memory: 8GB

## Technical Specifications

### Compute Infrastructure
- GPU: NVIDIA RTX 2080 Ti
- VRAM: 8GB minimum
- Training time: 3.6 minutes

### How to Get Started
```python
from swarmformer import SwarmFormerModel

model = SwarmFormerModel(
    vocab_size=30000,
    d_model=128,
    seq_len=256,
    cluster_size=8,
    num_layers=2,
    T_local=3
)
```

## Citation

```bibtex
@article{legg2025swarmformer,
  title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
  author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
  journal={Takara.ai Research},
  year={2025},
  url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
}
```

## Model Card Authors
Jordan Legg, Mikus Sturmanis, Takara.ai Research Team

## Model Card Contact
research@takara.ai