ccdv
/

lsg-bart-base-16384-arxiv

text2text-generation

Model card Files Files and versions Community

ccdv commited on May 9, 2022

Commit

9e45379

·

1 Parent(s): 46085ee

readme

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -26,9 +26,13 @@ It achieves the following results on the test set:
 ## Model description
 The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). \
-The model is warm started from [ccdv/lsg-bart-base-4096-arxiv](https://huggingface.co/ccdv/lsg-bart-base-4096-arxiv), converted to handle long sequences (encoder only) and fine tuned.
 ## Intended uses & limitations

 ## Model description
+The model relies on Local-Sparse-Global attention to handle long sequences:
+![attn](attn.png)
 The model has about ~145 millions parameters (6 encoder layers - 6 decoder layers). \
+The model is warm started from [ccdv/lsg-bart-base-4096-arxiv](https://huggingface.co/ccdv/lsg-bart-base-4096-arxiv), converted to handle long sequences (encoder only) and fine tuned. \
+**This model relies on a custom modeling file, you need to add trust_remote_code=True**\
+**See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
 ## Intended uses & limitations