File size: 5,312 Bytes
5a6ce56 fa7e6fe 5293fda 5a6ce56 db32075 01d4fde db32075 01d4fde db32075 01d4fde db32075 01d4fde db32075 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
license: apache-2.0
language: en
datasets:
- wikipedia
- bookcorpus
model-index:
- name: asi/albert-act-base
results:
- task:
type: text-classification
name: CoLA
dataset:
type: glue
name: CoLA # General Language Understanding Evaluation benchmark (GLUE)
split: cola
metrics:
- type: matthews_correlation
value: 33.8
name: Matthew's Corr
- task:
type: text-classification
name: SST-2
dataset:
type: glue
name: SST-2 # The Stanford Sentiment Treebank
split: sst2
metrics:
- type: accuracy
value: 88.6
name: Accuracy
- task:
type: text-classification
name: MRPC
dataset:
type: glue
name: MRPC # Microsoft Research Paraphrase Corpus
split: mrpc
metrics:
- type: accuracy
value: 79.4
name: Accuracy
- type: f1
value: 85.2
name: F1
- task:
type: text-similarity
name: STS-B
dataset:
type: glue
name: STS-B # Semantic Textual Similarity Benchmark
split: stsb
metrics:
- type: spearmanr
value: 81.2
name: Spearman Corr
- type: pearsonr
value: 82.7
name: Pearson Corr
- task:
type: text-classification
name: QQP
dataset:
type: glue
name: QQP # Quora Question Pairs
split: qqp
metrics:
- type: f1
value: 67.8
name: F1
- type: accuracy
value: 87.4
name: Accuracy
- task:
type: text-classification
name: MNLI-m
dataset:
type: glue
name: MNLI-m # MultiNLI Matched
split: mnli_matched
metrics:
- type: accuracy
value: 79.5
name: Accuracy
- task:
type: text-classification
name: MNLI-mm
dataset:
type: glue
name: MNLI-mm # MultiNLI Matched
split: mnli_mismatched
metrics:
- type: accuracy
value: 78.5
name: Accuracy
- task:
type: text-classification
name: QNLI
dataset:
type: glue
name: QNLI # Question NLI
split: qnli
metrics:
- type: accuracy
value: 88.3
name: Accuracy
- task:
type: text-classification
name: RTE
dataset:
type: glue
name: RTE # Recognizing Textual Entailment
split: rte
metrics:
- type: accuracy
value: 61.9
name: Accuracy
- task:
type: text-classification
name: WNLI
dataset:
type: glue
name: WNLI # Winograd NLI
split: wnli
metrics:
- type: accuracy
value: 65.1
name: Accuracy
---
# Adaptive Depth Transformers
Implementation of the paper "How Many Layers and Why? An Analysis of the Model Depth in Transformers". In this study, we investigate the role of the multiple layers in deep transformer models. We design a variant of ALBERT that dynamically adapts the number of layers for each token of the input.
## Model architecture
We augment a multi-layer transformer encoder with a halting mechanism, which dynamically adjusts the number of layers for each token.
We directly adapted this mechanism from Graves ([2016](#graves-2016)). At each iteration, we compute a probability for each token to stop updating its state.
## Model use
The architecture is not yet directly included in the Transformers library. The code used for pre-training is available in the following [github repository](https://github.com/AntoineSimoulin/adaptive-depth-transformers). So you should install the code implementation first:
```bash
!pip install git+https://github.com/AntoineSimoulin/adaptive-depth-transformers$
```
Then you can use the model directly.
```python
from act import AlbertActConfig, AlbertActModel, TFAlbertActModel
from transformers import AlbertTokenizer
tokenizer = AlbertTokenizer.from_pretrained('asi/albert-act-base')
model = AlbertActModel.from_pretrained('asi/albert-act-base')
_ = model.eval()
inputs = tokenizer("a lump in the middle of the monkeys stirred and then fell quiet .", return_tensors="pt")
outputs = model(**inputs)
outputs.updates
# tensor([[[[15., 9., 10., 7., 3., 8., 5., 7., 12., 10., 6., 8., 8., 9., 5., 8.]]]])
```
## Citations
### BibTeX entry and citation info
If you use our iterative transformer model for your scientific publication or your industrial applications, please cite the following [paper](https://aclanthology.org/2021.acl-srw.23/):
```bibtex
@inproceedings{simoulin-crabbe-2021-many,
title = "How Many Layers and Why? {A}n Analysis of the Model Depth in Transformers",
author = "Simoulin, Antoine and
Crabb{\'e}, Benoit",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-srw.23",
doi = "10.18653/v1/2021.acl-srw.23",
pages = "221--228",
}
```
### References
><div id="graves-2016">Alex Graves. 2016. Adaptive computation time for recurrent neural networks. CoRR, abs/1603.08983.</div>
|