Migrate model card from transformers-repo
Browse filesRead announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/shoarora/alectra-small-owt/README.md
README.md
ADDED
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# ALECTRA-small-OWT
|
2 |
+
|
3 |
+
This is an extension of
|
4 |
+
[ELECTRA](https://openreview.net/forum?id=r1xMH1BtvB) small model, trained on the
|
5 |
+
[OpenWebText corpus](https://skylion007.github.io/OpenWebTextCorpus/).
|
6 |
+
The training task (discriminative LM / replaced-token-detection) can be generalized to any transformer type. Here, we train an ALBERT model under the same scheme.
|
7 |
+
|
8 |
+
## Pretraining task
|
9 |
+
![electra task diagram](https://github.com/shoarora/lmtuners/raw/master/assets/electra.png)
|
10 |
+
(figure from [Clark et al. 2020](https://openreview.net/pdf?id=r1xMH1BtvB))
|
11 |
+
|
12 |
+
ELECTRA uses discriminative LM / replaced-token-detection for pretraining.
|
13 |
+
This involves a generator (a Masked LM model) creating examples for a discriminator
|
14 |
+
to classify as original or replaced for each token.
|
15 |
+
|
16 |
+
The generator generalizes to any `*ForMaskedLM` model and the discriminator could be
|
17 |
+
any `*ForTokenClassification` model. Therefore, we can extend the task to ALBERT models,
|
18 |
+
not just BERT as in the original paper.
|
19 |
+
|
20 |
+
## Usage
|
21 |
+
```python
|
22 |
+
from transformers import AlbertForSequenceClassification, BertTokenizer
|
23 |
+
|
24 |
+
# Both models use the bert-base-uncased tokenizer and vocab.
|
25 |
+
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
|
26 |
+
alectra = AlbertForSequenceClassification.from_pretrained('shoarora/alectra-small-owt')
|
27 |
+
```
|
28 |
+
NOTE: this ALBERT model uses a BERT WordPiece tokenizer.
|
29 |
+
|
30 |
+
## Code
|
31 |
+
The pytorch module that implements this task is available [here](https://github.com/shoarora/lmtuners/blob/master/lmtuners/lightning_modules/discriminative_lm.py).
|
32 |
+
|
33 |
+
Further implementation information [here](https://github.com/shoarora/lmtuners/tree/master/experiments/disc_lm_small),
|
34 |
+
and [here](https://github.com/shoarora/lmtuners/blob/master/experiments/disc_lm_small/train_alectra_small.py) is the script that created this model.
|
35 |
+
|
36 |
+
This specific model was trained with the following params:
|
37 |
+
- `batch_size: 512`
|
38 |
+
- `training_steps: 5e5`
|
39 |
+
- `warmup_steps: 4e4`
|
40 |
+
- `learning_rate: 2e-3`
|
41 |
+
|
42 |
+
|
43 |
+
## Downstream tasks
|
44 |
+
#### GLUE Dev results
|
45 |
+
| Model | # Params | CoLA | SST | MRPC | STS | QQP | MNLI | QNLI | RTE |
|
46 |
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
47 |
+
| ELECTRA-Small++ | 14M | 57.0 | 91. | 88.0 | 87.5 | 89.0 | 81.3 | 88.4 | 66.7|
|
48 |
+
| ELECTRA-Small-OWT | 14M | 56.8 | 88.3| 87.4 | 86.8 | 88.3 | 78.9 | 87.9 | 68.5|
|
49 |
+
| ELECTRA-Small-OWT (ours) | 17M | 56.3 | 88.4| 75.0 | 86.1 | 89.1 | 77.9 | 83.0 | 67.1|
|
50 |
+
| ALECTRA-Small-OWT (ours) | 4M | 50.6 | 89.1| 86.3 | 87.2 | 89.1 | 78.2 | 85.9 | 69.6|
|
51 |
+
|
52 |
+
|
53 |
+
#### GLUE Test results
|
54 |
+
| Model | # Params | CoLA | SST | MRPC | STS | QQP | MNLI | QNLI | RTE |
|
55 |
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
56 |
+
| BERT-Base | 110M | 52.1 | 93.5| 84.8 | 85.9 | 89.2 | 84.6 | 90.5 | 66.4|
|
57 |
+
| GPT | 117M | 45.4 | 91.3| 75.7 | 80.0 | 88.5 | 82.1 | 88.1 | 56.0|
|
58 |
+
| ELECTRA-Small++ | 14M | 57.0 | 91.2| 88.0 | 87.5 | 89.0 | 81.3 | 88.4 | 66.7|
|
59 |
+
| ELECTRA-Small-OWT (ours) | 17M | 57.4 | 89.3| 76.2 | 81.9 | 87.5 | 78.1 | 82.4 | 68.1|
|
60 |
+
| ALECTRA-Small-OWT (ours) | 4M | 43.9 | 87.9| 82.1 | 82.0 | 87.6 | 77.9 | 85.8 | 67.5|
|