Update README.md
Browse files
README.md
CHANGED
@@ -6,4 +6,18 @@ datasets:
|
|
6 |
- roneneldan/TinyStories
|
7 |
language:
|
8 |
- en
|
9 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
- roneneldan/TinyStories
|
7 |
language:
|
8 |
- en
|
9 |
+
---
|
10 |
+
|
11 |
+
# Dataset
|
12 |
+
This model was trained using the TinyStories dataset, specifically with the GPT-4 version.
|
13 |
+
|
14 |
+
# The Model
|
15 |
+
The name "Deception" stems from the model's unique architecture, which combines elements of both Transformer and RNN architechtures. This fusion creates a deceptive yet beneficial design.
|
16 |
+
|
17 |
+
The model features a context length of 1024, but in theory, it can be extended indefinitely through fine-tuning.
|
18 |
+
|
19 |
+
|
20 |
+
|
21 |
+
|
22 |
+
|
23 |
+
Thank you to the creators of RWKV who made all of this possible. Their repo is here: https://github.com/BlinkDL/RWKV-LM
|