PeymanHosseini
commited on
Commit
•
2699f75
1
Parent(s):
115e671
Update README.md
Browse filesUpdates README (Fixes Grammer Errors)
README.md
CHANGED
@@ -16,14 +16,14 @@ This version of Hummingbird is only meant to demonstrate Efficient Attention for
|
|
16 |
|
17 |
## Model Details
|
18 |
|
19 |
-
The
|
20 |
|
21 |
| Parameter | size |
|
22 |
-
|
|
23 |
| # Transformer Blocks | 10 |
|
24 |
| Model Dimension | 3072 |
|
25 |
| # Heads | 1 |
|
26 |
|
27 |
|
28 |
-
The Attention Mechanism used is based on our newly proposed Efficient Attention from our paper, *You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism* ([arXiv:2403.01643](https://arxiv.org/abs/2403.01643)). We have chosen the number of
|
29 |
|
|
|
16 |
|
17 |
## Model Details
|
18 |
|
19 |
+
The model consists of 1.1 Billion parameters with the following specifications:
|
20 |
|
21 |
| Parameter | size |
|
22 |
+
| :------------------- | :--- |
|
23 |
| # Transformer Blocks | 10 |
|
24 |
| Model Dimension | 3072 |
|
25 |
| # Heads | 1 |
|
26 |
|
27 |
|
28 |
+
The Attention Mechanism used is based on our newly proposed Efficient Attention from our paper, *You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism* ([arXiv:2403.01643](https://arxiv.org/abs/2403.01643)). We have chosen the number of heads to be 1 as an interesting case study since all current LMs use multiple heads.
|
29 |
|