Update README.md
Browse files
README.md
CHANGED
@@ -107,7 +107,31 @@ license: mit
|
|
107 |
## This model can be used for Extractive QA
|
108 |
It has been finetuned for 3 epochs on [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/).
|
109 |
|
110 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
111 |
## DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
|
112 |
|
113 |
[DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data.
|
|
|
107 |
## This model can be used for Extractive QA
|
108 |
It has been finetuned for 3 epochs on [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/).
|
109 |
|
110 |
+
## Evaluation on SQuAD2.0 dev set
|
111 |
+
```
|
112 |
+
{
|
113 |
+
"epoch": 3.0,
|
114 |
+
"eval_HasAns_exact": 79.65587044534414,
|
115 |
+
"eval_HasAns_f1": 85.91387795001529,
|
116 |
+
"eval_HasAns_total": 5928,
|
117 |
+
"eval_NoAns_exact": 82.10260723296888,
|
118 |
+
"eval_NoAns_f1": 82.10260723296888,
|
119 |
+
"eval_NoAns_total": 5945,
|
120 |
+
"eval_best_exact": 80.8809904826076,
|
121 |
+
"eval_best_exact_thresh": 0.0,
|
122 |
+
"eval_best_f1": 84.00551406448994,
|
123 |
+
"eval_best_f1_thresh": 0.0,
|
124 |
+
"eval_exact": 80.8809904826076,
|
125 |
+
"eval_f1": 84.00551406449004,
|
126 |
+
"eval_samples": 12508,
|
127 |
+
"eval_total": 11873,
|
128 |
+
"train_loss": 0.7729689576483615,
|
129 |
+
"train_runtime": 9118.953,
|
130 |
+
"train_samples": 134891,
|
131 |
+
"train_samples_per_second": 44.377,
|
132 |
+
"train_steps_per_second": 0.925
|
133 |
+
}
|
134 |
+
```
|
135 |
## DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
|
136 |
|
137 |
[DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data.
|