File size: 3,130 Bytes
adc1dd4 e7a4766 adc1dd4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
datasets:
- stanfordnlp/imdb
language:
- en
library_name: swarmformer
---
# Model Card for SwarmFormer-Small
SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements.
## Model Details
### Model Description
Compact version of SwarmFormer with:
- Token embedding layer with dropout (0.3)
- Two SwarmFormer layers
- Mean pooling and classification
- Optimized for shorter sequences
- **Developed by**: Jordan Legg, Mikus Sturmanis, Takara.ai
- **Funded by**: Takara.ai
- **Shared by**: Takara.ai
- **Model type**: Hierarchical transformer
- **Language(s)**: English
- **License**: Not specified
- **Finetuned from model**: Trained from scratch
### Model Sources
- **Repository**: https://github.com/takara-ai/SwarmFormer
- **Paper**: Takara.ai Research
- **Demo**: Not available
## Uses
### Direct Use
- Text classification
- Sentiment analysis
- Resource-constrained environments
### Out-of-Scope Use
- Text generation
- Machine translation
- Tasks requiring >256 tokens
- Tasks requiring high precision
## Training Details
### Training Data
- Dataset: IMDB Movie Review
- Size: 50,000 samples
- Augmentation techniques applied
### Training Procedure
#### Model Architecture Details
1. **Token Embedding Layer**:
```python
- Embedding layer (vocab_size → 128)
- Dropout rate: 0.3
```
2. **Local Swarm Aggregator**:
```python
- Input dropout: 0.3
- Local MLP:
- Linear(128 → 128)
- GELU
- Dropout(0.3)
- Linear(128 → 128)
- Gate network with GELU
```
3. **Clustering Mechanism**:
- Cluster size: 8 tokens
- Mean pooling per cluster
4. **Global Cluster Attention**:
```python
- Q/K/V projections: Linear(128 → 128)
- Attention dropout: 0.3
```
#### Training Hyperparameters
- Embedding dimension: 128
- Number of layers: 2
- Local update steps: 3
- Cluster size: 8
- Sequence length: 256
- Batch size: 96
- Learning rate: 4.76 × 10⁻⁴
- Weight decay: 0.0541
- Dropout: 0.30
## Evaluation
### Results
- Accuracy: 86.20%
- Precision: 83.46%
- Recall: 90.31%
- F1: 86.75%
- Inference time: 0.36s (25k samples)
- Mean batch latency: 3.67ms
- Throughput: 45k samples/s
- Peak memory: 8GB
## Technical Specifications
### Compute Infrastructure
- GPU: NVIDIA RTX 2080 Ti
- VRAM: 8GB minimum
- Training time: 3.6 minutes
### How to Get Started
```python
from swarmformer import SwarmFormerModel
model = SwarmFormerModel(
vocab_size=30000,
d_model=128,
seq_len=256,
cluster_size=8,
num_layers=2,
T_local=3
)
```
## Citation
```bibtex
@article{legg2025swarmformer,
title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
journal={Takara.ai Research},
year={2025},
url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
}
```
## Model Card Authors
Jordan Legg, Mikus Sturmanis, Takara.ai Research Team
## Model Card Contact
research@takara.ai |