readme
Browse files
README.md
CHANGED
@@ -26,9 +26,13 @@ It achieves the following results on the test set:
|
|
26 |
|
27 |
|
28 |
## Model description
|
|
|
|
|
29 |
|
30 |
The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). \
|
31 |
-
The model is warm started from [ccdv/lsg-bart-base-4096-arxiv](https://huggingface.co/ccdv/lsg-bart-base-4096-arxiv), converted to handle long sequences (encoder only) and fine tuned.
|
|
|
|
|
32 |
|
33 |
## Intended uses & limitations
|
34 |
|
|
|
26 |
|
27 |
|
28 |
## Model description
|
29 |
+
The model relies on Local-Sparse-Global attention to handle long sequences:
|
30 |
+
![attn](attn.png)
|
31 |
|
32 |
The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). \
|
33 |
+
The model is warm started from [ccdv/lsg-bart-base-4096-arxiv](https://huggingface.co/ccdv/lsg-bart-base-4096-arxiv), converted to handle long sequences (encoder only) and fine tuned. \
|
34 |
+
**This model relies on a custom modeling file, you need to add trust_remote_code=True**\
|
35 |
+
**See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
|
36 |
|
37 |
## Intended uses & limitations
|
38 |
|