Hitesh1501
/

Longformer

English

exbert

Model card Files Files and versions Community

Hitesh1501 commited on Jun 14, 2023

Commit

314bbe2

•

1 Parent(s): 09f2aad

Update README.md

Browse files

Files changed (1) hide show

README.md +38 -17

README.md CHANGED Viewed

@@ -6,30 +6,51 @@ license: apache-2.0
 datasets:
 - bookcorpus
 - wikipedia
 ---
 # BERT base model (uncased)
-Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
-[this paper](https://arxiv.org/abs/1810.04805) and first released in
-[this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
-between english and English.
-Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
-the Hugging Face team.
 ## Model description
-BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
-was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of
-publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
-was pretrained with two objectives:
-- Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
-  the entire masked sentence through the model and has to predict the masked words. This is different from traditional
-  recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
-  GPT which internally masks the future tokens. It allows the model to learn a bidirectional representation of the
-  sentence.
 - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
   they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
   predict if the two sentences were following each other or not.
@@ -248,4 +269,4 @@ Glue test results:
 <a href="https://huggingface.co/exbert/?model=bert-base-uncased">
 	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
-</a>

 datasets:
 - bookcorpus
 - wikipedia
+- trivia_qa
 ---
 # BERT base model (uncased)
+longformer-base-4096 is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096.
+It was introduced in
+[this paper](https://arxiv.org/abs/2004.05150) and first released in
+[this repository](https://github.com/allenai/longformer). Longformer uses a combination of a sliding window (local) attention and global attention.
+Global attention is user-configured based on the task to allow the model to learn task-specific representations.
 ## Model description
+Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length.
+Longformer-Encoder-Decoder (LED), a Longformer variant for supporting long document generative sequence-to-sequence tasks,
+and demonstrate its effectiveness on the arXiv summarization dataset.
+- "Transformer-based models are unable to pro-
+cess long sequences due to their self-attention
+operation, which scales quadratically with the
+sequence length. To address this limitation,
+we introduce the Longformer with an attention
+mechanism that scales linearly with sequence
+length, making it easy to process documents of
+thousands of tokens or longer. Longformer’s
+attention mechanism is a drop-in replacement
+for the standard self-attention and combines
+a local windowed attention with a task moti-
+vated global attention. Following prior work
+on long-sequence transformers, we evaluate
+Longformer on character-level language mod-
+eling and achieve state-of-the-art results on
+text8 and enwik8. In contrast to most
+prior work, we also pretrain Longformer and
+finetune it on a variety of downstream tasks.
+Our pretrained Longformer consistently out-
+performs RoBERTa on long document tasks
+and sets new state-of-the-art results on Wiki-
+Hop and TriviaQA. We finally introduce the
+Longformer-Encoder-Decoder (LED), a Long-
+former variant for supporting long document
+generative sequence-to-sequence tasks, and
+demonstrate its effectiveness on the arXiv sum-
+marization dataset."
 - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
   they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
   predict if the two sentences were following each other or not.
 <a href="https://huggingface.co/exbert/?model=bert-base-uncased">
 	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
+</a>