Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,17 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
**Don't use this model for any applied task. It too small to be practically useful. It is just a part of a weird research project.**
|
6 |
+
|
7 |
+
An extremely small version of T5 with these parameters
|
8 |
+
|
9 |
+
```python
|
10 |
+
"d_ff": 1024,
|
11 |
+
"d_kv": 64,
|
12 |
+
"d_model": 256,
|
13 |
+
"num_heads": 4,
|
14 |
+
"num_layers": 1, # yes, just one layer
|
15 |
+
```
|
16 |
+
|
17 |
+
The model was pre-trained on `realnewslike` subset of C4 for 1 epoch with sequence length `64`. Corresponding WandB run: [click](https://wandb.ai/guitaricet/t5-lm/runs/2yvuxsfz?workspace=user-guitaricet).
|