Update README.md
Browse files
README.md
CHANGED
@@ -9,27 +9,48 @@ model-index:
|
|
9 |
results: []
|
10 |
---
|
11 |
|
12 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
13 |
-
should probably proofread and complete it, then remove this comment. -->
|
14 |
-
|
15 |
# roberta-base-squad2
|
16 |
|
17 |
This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the squad_v2 dataset.
|
18 |
|
19 |
-
## Model description
|
20 |
-
|
21 |
-
More information needed
|
22 |
-
|
23 |
-
## Intended uses & limitations
|
24 |
-
|
25 |
-
More information needed
|
26 |
-
|
27 |
## Training and evaluation data
|
28 |
|
29 |
-
|
30 |
|
31 |
## Training procedure
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
### Training hyperparameters
|
34 |
|
35 |
The following hyperparameters were used during training:
|
@@ -48,7 +69,32 @@ The following hyperparameters were used during training:
|
|
48 |
|
49 |
### Training results
|
50 |
|
51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
### Framework versions
|
54 |
|
|
|
9 |
results: []
|
10 |
---
|
11 |
|
|
|
|
|
|
|
12 |
# roberta-base-squad2
|
13 |
|
14 |
This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the squad_v2 dataset.
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
## Training and evaluation data
|
17 |
|
18 |
+
Trained and evaluated on the [squad_v2 dataset](https://huggingface.co/datasets/squad_v2).
|
19 |
|
20 |
## Training procedure
|
21 |
|
22 |
+
Trained on 16 Graphcore Mk2 IPUs using [optimum-graphcore](https://github.com/huggingface/optimum-graphcore).
|
23 |
+
|
24 |
+
Command line:
|
25 |
+
|
26 |
+
```
|
27 |
+
python examples/question-answering/run_qa.py \
|
28 |
+
--ipu_config_name Graphcore/roberta-base-ipu \
|
29 |
+
--model_name_or_path roberta-base \
|
30 |
+
--dataset_name squad_v2 \
|
31 |
+
--version_2_with_negative \
|
32 |
+
--do_train \
|
33 |
+
--do_eval \
|
34 |
+
--num_train_epochs 3 \
|
35 |
+
--per_device_train_batch_size 4 \
|
36 |
+
--per_device_eval_batch_size 2 \
|
37 |
+
--pod_type pod16 \
|
38 |
+
--learning_rate 7e-5 \
|
39 |
+
--max_seq_length 384 \
|
40 |
+
--doc_stride 128 \
|
41 |
+
--seed 1984 \
|
42 |
+
--lr_scheduler_type linear \
|
43 |
+
--loss_scaling 64 \
|
44 |
+
--weight_decay 0.01 \
|
45 |
+
--warmup_ratio 0.2 \
|
46 |
+
--logging_steps 1 \
|
47 |
+
--save_steps -1 \
|
48 |
+
--dataloader_num_workers 64 \
|
49 |
+
--output_dir roberta-base-squad2 \
|
50 |
+
--overwrite_output_dir \
|
51 |
+
--push_to_hub
|
52 |
+
```
|
53 |
+
|
54 |
### Training hyperparameters
|
55 |
|
56 |
The following hyperparameters were used during training:
|
|
|
69 |
|
70 |
### Training results
|
71 |
|
72 |
+
```
|
73 |
+
***** train metrics *****
|
74 |
+
epoch = 3.0
|
75 |
+
train_loss = 0.9982
|
76 |
+
train_runtime = 0:04:44.21
|
77 |
+
train_samples = 131823
|
78 |
+
train_samples_per_second = 1391.43
|
79 |
+
train_steps_per_second = 5.425
|
80 |
+
|
81 |
+
***** eval metrics *****
|
82 |
+
epoch = 3.0
|
83 |
+
eval_HasAns_exact = 78.1208
|
84 |
+
eval_HasAns_f1 = 84.6569
|
85 |
+
eval_HasAns_total = 5928
|
86 |
+
eval_NoAns_exact = 82.0353
|
87 |
+
eval_NoAns_f1 = 82.0353
|
88 |
+
eval_NoAns_total = 5945
|
89 |
+
eval_best_exact = 80.0809
|
90 |
+
eval_best_exact_thresh = 0.0
|
91 |
+
eval_best_f1 = 83.3442
|
92 |
+
eval_best_f1_thresh = 0.0
|
93 |
+
eval_exact = 80.0809
|
94 |
+
eval_f1 = 83.3442
|
95 |
+
eval_samples = 12165
|
96 |
+
eval_total = 11873
|
97 |
+
```
|
98 |
|
99 |
### Framework versions
|
100 |
|