Update README.md
Browse files
README.md
CHANGED
@@ -29,6 +29,19 @@ We present the dev results on SQuAD 1.1/2.0 and several GLUE benchmark tasks.
|
|
29 |
|**DeBERTa-XXLarge-V2-mnli**| - | - |**91.7/91.8**| - | - | - | 93.5 | - | - |- |
|
30 |
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
### Citation
|
33 |
|
34 |
If you find DeBERTa useful for your work, please cite the following paper:
|
|
|
29 |
|**DeBERTa-XXLarge-V2-mnli**| - | - |**91.7/91.8**| - | - | - | 93.5 | - | - |- |
|
30 |
|
31 |
|
32 |
+
## Note
|
33 |
+
|
34 |
+
To try the **XXLarge** model with **HF transformers**, you need to specify **--sharded_ddp**
|
35 |
+
|
36 |
+
```bash
|
37 |
+
|
38 |
+
cd transformers/examples/text-classification/
|
39 |
+
|
40 |
+
python -m torch.distributed.launch --nproc_per_node=8 run_glue.py --model_name_or_path microsoft/deberta-xxlarge-v2 \
|
41 |
+
--task_name $TASK_NAME --do_train --do_eval --max_seq_length 128 --per_device_train_batch_size 4 \
|
42 |
+
--learning_rate 3e-6 --num_train_epochs 3 --output_dir /tmp/$TASK_NAME/ --overwrite_output_dir --sharded_ddp
|
43 |
+
```
|
44 |
+
|
45 |
### Citation
|
46 |
|
47 |
If you find DeBERTa useful for your work, please cite the following paper:
|