Create deberta xlarge 750M model

Browse files

Files changed (5) hide show

README.md +38 -0
bpe_encoder.bin +3 -0
config.json +18 -0
pytorch_model.bin +3 -0
tokenizer_config.json +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,38 @@

+---
+thumbnail: https://huggingface.co/front/thumbnails/microsoft.png
+license: mit
+---
+## DeBERTa: Decoding-enhanced BERT with Disentangled Attention
+[DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data.
+Please check the [official repository](https://github.com/microsoft/DeBERTa) for more details and updates.
+This the DeBERTa xlarge model with 48 layers, 1024 hidden size. Total parameters 750M.
+#### Fine-tuning on NLU tasks
+We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.
+| Model             | SQuAD 1.1 | SQuAD 2.0 | MNLI-m | SST-2 | QNLI | CoLA | RTE  | MRPC | QQP  |STS-B|
+|-------------------|-----------|-----------|--------|-------|------|------|------|------|------|-----|
+| BERT-Large        | 90.9/84.1 | 81.8/79.0 | 86.6   | 93.2  | 92.3 | 60.6 | 70.4 | 88.0 | 91.3 |90.0 |
+| RoBERTa-Large     | 94.6/88.9 | 89.4/86.5 | 90.2   | 96.4  | 93.9 | 68.0 | 86.6 | 90.9 | 92.2 |92.4 |
+| XLNet-Large       | 95.1/89.7 | 90.6/87.9 | 90.8   | 97.0  | 94.9 | 69.0 | 85.9 | 90.8 | 92.3 |92.5 |
+| **DeBERTa-Large** | 95.5/90.1 | 90.7/88.0 | 91.1   | 96.5  | 95.3 | 69.5 | 88.1 | 92.5 | 92.3 |92.5 |
+### Citation
+If you find DeBERTa useful for your work, please cite the following paper:
+``` latex
+@misc{he2020deberta,
+    title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention},
+    author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
+    year={2020},
+    eprint={2006.03654},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+		}
+```

bpe_encoder.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e7c6f9eecb461c01e09c00656ccf3e27944b9e74bfe29e51632b13d3cd9d6c8e
+size 3917897

config.json ADDED Viewed

	@@ -0,0 +1,18 @@

+{
+	"attention_probs_dropout_prob": 0.1,
+	"hidden_act": "gelu",
+	"hidden_dropout_prob": 0.1,
+	"hidden_size": 1024,
+	"initializer_range": 0.02,
+	"intermediate_size": 4096,
+	"max_position_embeddings": 512,
+	"relative_attention": true,
+	"pos_att_type": "c2p|p2c",
+	"layer_norm_eps": 1e-7,
+	"max_relative_positions": -1,
+	"position_biased_input": false,
+	"num_attention_heads": 16,
+	"num_hidden_layers": 48,
+	"type_vocab_size": 0,
+	"vocab_size": 50265
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34618faca5332b6ebf078c8e243c0f426c3dc39e299821d492530ef16ae81477
+size 1518990915

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+  "do_lower_case": false
+}