takarajordan commited on
Commit
adc1dd4
·
verified ·
1 Parent(s): 922282b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -0
README.md ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - stanfordnlp/imdb
4
+ language:
5
+ - en
6
+ ---
7
+ # Model Card for SwarmFormer-Small
8
+
9
+ SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements.
10
+
11
+ ## Model Details
12
+
13
+ ### Model Description
14
+ Compact version of SwarmFormer with:
15
+ - Token embedding layer with dropout (0.3)
16
+ - Two SwarmFormer layers
17
+ - Mean pooling and classification
18
+ - Optimized for shorter sequences
19
+
20
+ - **Developed by**: Jordan Legg, Mikus Sturmanis, Takara.ai
21
+ - **Funded by**: Takara.ai
22
+ - **Shared by**: Takara.ai
23
+ - **Model type**: Hierarchical transformer
24
+ - **Language(s)**: English
25
+ - **License**: Not specified
26
+ - **Finetuned from model**: Trained from scratch
27
+
28
+ ### Model Sources
29
+ - **Repository**: https://github.com/takara-ai/SwarmFormer
30
+ - **Paper**: Takara.ai Research
31
+ - **Demo**: Not available
32
+
33
+ ## Uses
34
+
35
+ ### Direct Use
36
+ - Text classification
37
+ - Sentiment analysis
38
+ - Resource-constrained environments
39
+
40
+ ### Out-of-Scope Use
41
+ - Text generation
42
+ - Machine translation
43
+ - Tasks requiring >256 tokens
44
+ - Tasks requiring high precision
45
+
46
+ ## Training Details
47
+
48
+ ### Training Data
49
+ - Dataset: IMDB Movie Review
50
+ - Size: 50,000 samples
51
+ - Augmentation techniques applied
52
+
53
+ ### Training Procedure
54
+
55
+ #### Model Architecture Details
56
+ 1. **Token Embedding Layer**:
57
+ ```python
58
+ - Embedding layer (vocab_size → 128)
59
+ - Dropout rate: 0.3
60
+ ```
61
+
62
+ 2. **Local Swarm Aggregator**:
63
+ ```python
64
+ - Input dropout: 0.3
65
+ - Local MLP:
66
+ - Linear(128 → 128)
67
+ - GELU
68
+ - Dropout(0.3)
69
+ - Linear(128 → 128)
70
+ - Gate network with GELU
71
+ ```
72
+
73
+ 3. **Clustering Mechanism**:
74
+ - Cluster size: 8 tokens
75
+ - Mean pooling per cluster
76
+
77
+ 4. **Global Cluster Attention**:
78
+ ```python
79
+ - Q/K/V projections: Linear(128 → 128)
80
+ - Attention dropout: 0.3
81
+ ```
82
+
83
+ #### Training Hyperparameters
84
+ - Embedding dimension: 128
85
+ - Number of layers: 2
86
+ - Local update steps: 3
87
+ - Cluster size: 8
88
+ - Sequence length: 256
89
+ - Batch size: 96
90
+ - Learning rate: 4.76 × 10⁻⁴
91
+ - Weight decay: 0.0541
92
+ - Dropout: 0.30
93
+
94
+ ## Evaluation
95
+
96
+ ### Results
97
+ - Accuracy: 86.20%
98
+ - Precision: 83.46%
99
+ - Recall: 90.31%
100
+ - F1: 86.75%
101
+ - Inference time: 0.36s (25k samples)
102
+ - Mean batch latency: 3.67ms
103
+ - Throughput: 45k samples/s
104
+ - Peak memory: 8GB
105
+
106
+ ## Technical Specifications
107
+
108
+ ### Compute Infrastructure
109
+ - GPU: NVIDIA RTX 2080 Ti
110
+ - VRAM: 8GB minimum
111
+ - Training time: 3.6 minutes
112
+
113
+ ### How to Get Started
114
+ ```python
115
+ from swarmformer import SwarmFormerModel
116
+
117
+ model = SwarmFormerModel(
118
+ vocab_size=30000,
119
+ d_model=128,
120
+ seq_len=256,
121
+ cluster_size=8,
122
+ num_layers=2,
123
+ T_local=3
124
+ )
125
+ ```
126
+
127
+ ## Citation
128
+
129
+ ```bibtex
130
+ @article{legg2025swarmformer,
131
+ title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
132
+ author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
133
+ journal={Takara.ai Research},
134
+ year={2025},
135
+ url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
136
+ }
137
+ ```
138
+
139
+ ## Model Card Authors
140
+ Jordan Legg, Mikus Sturmanis, Takara.ai Research Team
141
+
142
+ ## Model Card Contact
143
+ research@takara.ai