GLUE results
We also evalute the language understanding performance of Uni-Perceiver on GLUE benchmarks. The results are listed as below.
Dataset | MNLI | QNLI | QQP | RTE | SST-2 | MRPC | CoLA |
---|---|---|---|---|---|---|---|
Metric | Acc | Acc | F1 | Acc | Acc | F1 | Acc |
Uni-PerceiverBASE | 79.7 | 87.3 | 86.7 | 71.1 | 89.3 | 86.0 | 43.1 |
Uni-Perceiver-MoEBASE | 81.5 | 88.2 | 87.8 | 75.8 | 90.9 | 87.1 | 52.2 |
Uni-PerceiverLARGE | 82.5 | 89.2 | 87.7 | 73.7 | 91.2 | 90.2 | 52.0 |
Uni-Perceiver-MoELARGE | 85.7 | 91.9 | 89.5 | 78.4 | 93.4 | 91.2 | 57.4 |
All fine-tuning experiments are performed on 1 GPU.
We use the hyper-parameters for GLUE tasks from fair-seq
Model | MNLI | QNLI | QQP | RTE | SST-2 | MRPC | CoLA | STS-B |
---|---|---|---|---|---|---|---|---|
--num-classes |
3 | 2 | 2 | 2 | 2 | 2 | 2 | 1 |
--lr |
5e-6 | 1e-5 | 1e-5 | 1e-5 | 5e-6 | 2e-5 | 2e-5 | 2e-5 |
bsz |
128 | 32 | 32 | 32 | 128 | 64 | 64 | 32 |
--total-num-update |
30968 | 33112 | 113272 | 1018 | 5233 | 1148 | 1334 | 1799 |
--warmup-updates |
1858 | 1986 | 6796 | 61 | 314 | 68 | 80 | 107 |
--warmup-updates |
1858 | 1986 | 6796 | 61 | 314 | 68 | 80 | 107 |
- Following RoBerta, we finetune RTE, STS and MRPC starting from the MNLI single-task model, rather than the baseline pretrained model.