English
exbert
Hitesh1501 commited on
Commit
314bbe2
1 Parent(s): 09f2aad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -17
README.md CHANGED
@@ -6,30 +6,51 @@ license: apache-2.0
6
  datasets:
7
  - bookcorpus
8
  - wikipedia
 
9
  ---
10
 
11
  # BERT base model (uncased)
12
 
13
- Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
14
- [this paper](https://arxiv.org/abs/1810.04805) and first released in
15
- [this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference
16
- between english and English.
 
17
 
18
- Disclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by
19
- the Hugging Face team.
20
 
21
  ## Model description
22
 
23
- BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it
24
- was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of
25
- publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it
26
- was pretrained with two objectives:
27
-
28
- - Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run
29
- the entire masked sentence through the model and has to predict the masked words. This is different from traditional
30
- recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like
31
- GPT which internally masks the future tokens. It allows the model to learn a bidirectional representation of the
32
- sentence.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
34
  they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
35
  predict if the two sentences were following each other or not.
@@ -248,4 +269,4 @@ Glue test results:
248
 
249
  <a href="https://huggingface.co/exbert/?model=bert-base-uncased">
250
  <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
251
- </a>
 
6
  datasets:
7
  - bookcorpus
8
  - wikipedia
9
+ - trivia_qa
10
  ---
11
 
12
  # BERT base model (uncased)
13
 
14
+ longformer-base-4096 is a BERT-like model started from the RoBERTa checkpoint and pretrained for MLM on long documents. It supports sequences of length up to 4,096.
15
+ It was introduced in
16
+ [this paper](https://arxiv.org/abs/2004.05150) and first released in
17
+ [this repository](https://github.com/allenai/longformer). Longformer uses a combination of a sliding window (local) attention and global attention.
18
+ Global attention is user-configured based on the task to allow the model to learn task-specific representations.
19
 
 
 
20
 
21
  ## Model description
22
 
23
+ Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length.
24
+ Longformer-Encoder-Decoder (LED), a Longformer variant for supporting long document generative sequence-to-sequence tasks,
25
+ and demonstrate its effectiveness on the arXiv summarization dataset.
26
+
27
+ - "Transformer-based models are unable to pro-
28
+ cess long sequences due to their self-attention
29
+ operation, which scales quadratically with the
30
+ sequence length. To address this limitation,
31
+ we introduce the Longformer with an attention
32
+ mechanism that scales linearly with sequence
33
+ length, making it easy to process documents of
34
+ thousands of tokens or longer. Longformer’s
35
+ attention mechanism is a drop-in replacement
36
+ for the standard self-attention and combines
37
+ a local windowed attention with a task moti-
38
+ vated global attention. Following prior work
39
+ on long-sequence transformers, we evaluate
40
+ Longformer on character-level language mod-
41
+ eling and achieve state-of-the-art results on
42
+ text8 and enwik8. In contrast to most
43
+ prior work, we also pretrain Longformer and
44
+ finetune it on a variety of downstream tasks.
45
+ Our pretrained Longformer consistently out-
46
+ performs RoBERTa on long document tasks
47
+ and sets new state-of-the-art results on Wiki-
48
+ Hop and TriviaQA. We finally introduce the
49
+ Longformer-Encoder-Decoder (LED), a Long-
50
+ former variant for supporting long document
51
+ generative sequence-to-sequence tasks, and
52
+ demonstrate its effectiveness on the arXiv sum-
53
+ marization dataset."
54
  - Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes
55
  they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to
56
  predict if the two sentences were following each other or not.
 
269
 
270
  <a href="https://huggingface.co/exbert/?model=bert-base-uncased">
271
  <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
272
+ </a>