Update README.md
Browse files
README.md
CHANGED
@@ -18,8 +18,7 @@ by [Georges Hark](https://twitter.com/ghark) and [Varuna Jayasiri](https://twitt
|
|
18 |
in addition to using relative positions in the attention score calculation by RoPE embeddings,
|
19 |
adds relative positional information explicitly to value embeddings.
|
20 |
Specifically, it incorporates the relative positions of the tokens paid attention to.
|
21 |
-
RoPER
|
22 |
-
Results have shown an improvement over RoPE in a language modeling setting on a 3 billion parameter transformer.
|
23 |
|
24 |
## Model details
|
25 |
|
|
|
18 |
in addition to using relative positions in the attention score calculation by RoPE embeddings,
|
19 |
adds relative positional information explicitly to value embeddings.
|
20 |
Specifically, it incorporates the relative positions of the tokens paid attention to.
|
21 |
+
RoPER has given better performance in some algorithmic tasks, and seems comparable to RoPE in language modeling.
|
|
|
22 |
|
23 |
## Model details
|
24 |
|