Update README.md
Browse files
README.md
CHANGED
@@ -13,9 +13,10 @@ license: mit
|
|
13 |
|
14 |
Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
|
15 |
|
16 |
-
In DeBERTa V3 we replaced MLM objective with RTD(Replaced Token Detection) objective
|
|
|
|
|
17 |
|
18 |
-
This is the DeBERTa V3 small model with 6 layers, 768 hidden size. Total parameters is 143M while Embedding layer take about 98M due to the usage of 128k vocabulary. It's trained with 160GB data.
|
19 |
|
20 |
#### Fine-tuning on NLU tasks
|
21 |
|
@@ -26,7 +27,7 @@ We present the dev results on SQuAD 1.1/2.0 and MNLI tasks.
|
|
26 |
| RoBERTa-base | 91.5/84.6 | 83.7/80.5 | 87.6 |
|
27 |
| XLNet-base | -/- | -/80.2 | 86.8 |
|
28 |
|DeBERTa-base |93.1/87.2| 86.2/83.1| 88.8|
|
29 |
-
| **DeBERTa-v3-small** | -/- | -/- | 88.
|
30 |
| DeBERTa-v3-small+SiFT | -/- | -/- | 88.8 |
|
31 |
|
32 |
|
|
|
13 |
|
14 |
Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
|
15 |
|
16 |
+
In DeBERTa V3, we replaced the MLM objective with the RTD(Replaced Token Detection) objective introduced by ELECTRA for pre-training, as well as some innovations to be introduced in our upcoming paper. Compared to DeBERTa-V2, our V3 version significantly improves the model performance in downstream tasks. You can find a simple introduction about the model from the appendix A11 in our original [paper](https://arxiv.org/abs/2006.03654), but we will provide more details in a separate write-up.
|
17 |
+
|
18 |
+
The DeBERTa V3 large model comes with 6 layers and a hidden size of 768. Its total parameter number is 143M since we use a vocabulary containing 128K tokens which introduce 98M parameters in the Embedding layer. This model was trained using the 160GB data as DeBERTa V2.
|
19 |
|
|
|
20 |
|
21 |
#### Fine-tuning on NLU tasks
|
22 |
|
|
|
27 |
| RoBERTa-base | 91.5/84.6 | 83.7/80.5 | 87.6 |
|
28 |
| XLNet-base | -/- | -/80.2 | 86.8 |
|
29 |
|DeBERTa-base |93.1/87.2| 86.2/83.1| 88.8|
|
30 |
+
| **DeBERTa-v3-small** | -/- | -/- | 88.2 |
|
31 |
| DeBERTa-v3-small+SiFT | -/- | -/- | 88.8 |
|
32 |
|
33 |
|