botcon commited on
Commit
c179832
·
1 Parent(s): 8afcad2

Upload 2 files

Browse files
Files changed (2) hide show
  1. QuestionAnswering.py +493 -0
  2. README.md +26 -0
QuestionAnswering.py ADDED
@@ -0,0 +1,493 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import LukePreTrainedModel, LukeModel, AutoTokenizer, TrainingArguments, default_data_collator, Trainer, AutoModelForQuestionAnswering
2
+ from transformers.modeling_outputs import ModelOutput
3
+ from typing import Optional, Tuple, Union
4
+
5
+ import numpy as np
6
+ from tqdm import tqdm
7
+ import evaluate
8
+ import torch
9
+ from dataclasses import dataclass
10
+ from datasets import load_dataset, concatenate_datasets
11
+ from torch import nn
12
+ from torch.nn import CrossEntropyLoss
13
+ import collections
14
+ import re
15
+
16
+ train = False
17
+ test = True
18
+
19
+ PEFT = False
20
+ tf32 = True
21
+ fp16 = True
22
+
23
+ trained_model = "LUKE_squad_finetuned_qa_tf32"
24
+ train_checkpoint = None
25
+ squad_shift = False
26
+
27
+ # For testing
28
+ tokenizer_list = ["xlnet-base-cased"]
29
+ model_list = ["botcon/XLNET_squad_finetuned_large"]
30
+ question_list = ["who", "what", "where", "when", "which", "how", "whom", ".*"]
31
+
32
+ base_tokenizer = "xlnet-base-cased"
33
+ base_model = "studio-ousia/luke-base"
34
+
35
+ # base_tokenizer = "xlnet-base-cased"
36
+ # base_model = "xlnet-base-cased"
37
+
38
+ # base_tokenizer = "bert-base-cased"
39
+ # base_model = "SpanBERT/spanbert-base-cased"
40
+
41
+ torch.backends.cuda.matmul.allow_tf32 = tf32
42
+ torch.backends.cudnn.allow_tf32 = tf32
43
+ device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
44
+
45
+ # https://github.com/huggingface/transformers/blob/v4.34.1/src/transformers/models/luke/modeling_luke.py#L319-L353
46
+ # Taken from HF repository, easier to include additional features -- Currently identical to LukeForQuestionAnswering by H
47
+
48
+ @dataclass
49
+ class LukeQuestionAnsweringModelOutput(ModelOutput):
50
+ """
51
+ Outputs of question answering models.
52
+
53
+
54
+ Args:
55
+ loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided):
56
+ Total span extraction loss is the sum of a Cross-Entropy for the start and end positions.
57
+ start_logits (`torch.FloatTensor` of shape `(batch_size, sequence_length)`):
58
+ Span-start scores (before SoftMax).
59
+ end_logits (`torch.FloatTensor` of shape `(batch_size, sequence_length)`):
60
+ Span-end scores (before SoftMax).
61
+ hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
62
+ Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, +
63
+ one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`.
64
+
65
+
66
+ Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
67
+ entity_hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
68
+ Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of
69
+ shape `(batch_size, entity_length, hidden_size)`. Entity hidden-states of the model at the output of each
70
+ layer plus the initial entity embedding outputs.
71
+ attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
72
+ Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
73
+ sequence_length)`.
74
+
75
+
76
+ Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
77
+ heads.
78
+ """
79
+
80
+
81
+ loss: Optional[torch.FloatTensor] = None
82
+ start_logits: torch.FloatTensor = None
83
+ end_logits: torch.FloatTensor = None
84
+ hidden_states: Optional[Tuple[torch.FloatTensor]] = None
85
+ entity_hidden_states: Optional[Tuple[torch.FloatTensor]] = None
86
+ attentions: Optional[Tuple[torch.FloatTensor]] = None
87
+
88
+ class AugmentedLukeForQuestionAnswering(LukePreTrainedModel):
89
+ def __init__(self, config):
90
+ super().__init__(config)
91
+
92
+ # This is 2.
93
+ self.num_labels = config.num_labels
94
+
95
+ self.luke = LukeModel(config, add_pooling_layer=False)
96
+
97
+ '''
98
+ Any improvement to the model are expected here. Additional features, anything...
99
+ '''
100
+ self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
101
+ self.linear_dropout = nn.Dropout(0.1)
102
+
103
+
104
+ # Initialize weights and apply final processing
105
+ self.post_init()
106
+
107
+ def forward(
108
+ self,
109
+ input_ids: Optional[torch.LongTensor] = None,
110
+ attention_mask: Optional[torch.FloatTensor] = None,
111
+ token_type_ids: Optional[torch.LongTensor] = None,
112
+ position_ids: Optional[torch.FloatTensor] = None,
113
+ entity_ids: Optional[torch.LongTensor] = None,
114
+ entity_attention_mask: Optional[torch.FloatTensor] = None,
115
+ entity_token_type_ids: Optional[torch.LongTensor] = None,
116
+ entity_position_ids: Optional[torch.LongTensor] = None,
117
+ head_mask: Optional[torch.FloatTensor] = None,
118
+ inputs_embeds: Optional[torch.FloatTensor] = None,
119
+ start_positions: Optional[torch.LongTensor] = None,
120
+ end_positions: Optional[torch.LongTensor] = None,
121
+ output_attentions: Optional[bool] = None,
122
+ output_hidden_states: Optional[bool] = None,
123
+ return_dict: Optional[bool] = None,
124
+ ) -> Union[Tuple, LukeQuestionAnsweringModelOutput]:
125
+
126
+ r"""
127
+ start_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
128
+ Labels for position (index) of the start of the labelled span for computing the token classification loss.
129
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
130
+ are not taken into account for computing the loss.
131
+ end_positions (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
132
+ Labels for position (index) of the end of the labelled span for computing the token classification loss.
133
+ Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence
134
+ are not taken into account for computing the loss.
135
+ """
136
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
137
+
138
+
139
+ outputs = self.luke(
140
+ input_ids=input_ids,
141
+ attention_mask=attention_mask,
142
+ token_type_ids=token_type_ids,
143
+ position_ids=position_ids,
144
+ entity_ids=entity_ids,
145
+ entity_attention_mask=entity_attention_mask,
146
+ entity_token_type_ids=entity_token_type_ids,
147
+ entity_position_ids=entity_position_ids,
148
+ head_mask=head_mask,
149
+ inputs_embeds=inputs_embeds,
150
+ output_attentions=output_attentions,
151
+ output_hidden_states=output_hidden_states,
152
+ return_dict=True,
153
+ )
154
+
155
+
156
+ sequence_output = outputs.last_hidden_state
157
+ sequence_output = self.linear_dropout(sequence_output)
158
+
159
+ logits = self.qa_outputs(sequence_output)
160
+ start_logits, end_logits = logits.split(1, dim=-1)
161
+ start_logits : torch.Tensor = start_logits.squeeze(-1)
162
+ end_logits = end_logits.squeeze(-1)
163
+
164
+
165
+ total_loss = None
166
+ if start_positions is not None and end_positions is not None:
167
+ # If we are on multi-GPU, split add a dimension
168
+ if len(start_positions.size()) > 1:
169
+ start_positions = start_positions.squeeze(-1)
170
+ if len(end_positions.size()) > 1:
171
+ end_positions = end_positions.squeeze(-1)
172
+ # sometimes the start/end positions are outside our model inputs, we ignore these terms
173
+ ignored_index = start_logits.size(1)
174
+ start_positions.clamp_(0, ignored_index)
175
+ end_positions.clamp_(0, ignored_index)
176
+
177
+ loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
178
+ start_loss = loss_fct(start_logits, start_positions)
179
+ end_loss = loss_fct(end_logits, end_positions)
180
+ total_loss = (start_loss + end_loss) / 2
181
+
182
+
183
+ if not return_dict:
184
+ return tuple(
185
+ v
186
+ for v in [
187
+ total_loss,
188
+ start_logits,
189
+ end_logits,
190
+ outputs.hidden_states,
191
+ outputs.entity_hidden_states,
192
+ outputs.attentions,
193
+ ]
194
+ if v is not None
195
+ )
196
+
197
+
198
+ return LukeQuestionAnsweringModelOutput(
199
+ loss=total_loss,
200
+ start_logits=start_logits,
201
+ end_logits=end_logits,
202
+ hidden_states=outputs.hidden_states,
203
+ entity_hidden_states=outputs.entity_hidden_states,
204
+ attentions=outputs.attentions,
205
+ )
206
+
207
+ # Get data to train model - squadshift is designed as a validation/testing set, so there are multiple answers, take the shortest
208
+ def get_squadshifts_training():
209
+ wiki = load_dataset("squadshifts", "new_wiki")["test"]
210
+ nyt = load_dataset("squadshifts", "nyt")["test"]
211
+ reddit = load_dataset("squadshifts", "reddit")["test"]
212
+ raw_dataset = concatenate_datasets([wiki, nyt, reddit])
213
+ updated = raw_dataset.map(validation_to_train)
214
+ return updated
215
+
216
+ def validation_to_train(example):
217
+ answers = example["answers"]
218
+ answer_text = answers["text"]
219
+ index_min = min(range(len(answer_text)), key=lambda x : len(answer_text.__getitem__(x)))
220
+ answers["text"] = answers["text"][index_min:index_min+1]
221
+ answers["answer_start"] = answers["answer_start"][index_min:index_min+1]
222
+ return example
223
+
224
+ # Get subset with specific question word
225
+ def get_dataset(dataset, pattern):
226
+ return dataset.filter(lambda x : bool(re.search(r"\b{}\b".format(pattern), x["question"], flags=re.IGNORECASE)))
227
+
228
+ if __name__ == "__main__":
229
+ # Setting up tokenizer and helper functions
230
+ # Work-around for FastTokenizer - RoBERTa and LUKE share the same subword vocab, and we are not using entities functions of LUKE-tokenizer anyways
231
+ tokenizer = AutoTokenizer.from_pretrained(base_tokenizer)
232
+
233
+ # Necessary initialization
234
+ max_length = 512
235
+ stride = 128
236
+ batch_size = 8
237
+ n_best = 20
238
+ max_answer_length = 30
239
+ metric = evaluate.load("squad")
240
+ raw_datasets = load_dataset("squad")
241
+
242
+ raw_train = raw_datasets["train"]
243
+ raw_validation = raw_datasets["validation"]
244
+
245
+ def compute_metrics(start_logits, end_logits, features, examples):
246
+ example_to_features = collections.defaultdict(list)
247
+ for idx, feature in enumerate(features):
248
+ example_to_features[feature["example_id"]].append(idx)
249
+
250
+ predicted_answers = []
251
+ for example in tqdm(examples):
252
+ example_id = example["id"]
253
+ context = example["context"]
254
+ answers = []
255
+
256
+ # Loop through all features associated with that example
257
+ for feature_index in example_to_features[example_id]:
258
+ start_logit = start_logits[feature_index]
259
+ end_logit = end_logits[feature_index]
260
+ offsets = features[feature_index]["offset_mapping"]
261
+
262
+ start_indexes = np.argsort(start_logit)[-1 : -n_best - 1 : -1].tolist()
263
+ end_indexes = np.argsort(end_logit)[-1 : -n_best - 1 : -1].tolist()
264
+ for start_index in start_indexes:
265
+ for end_index in end_indexes:
266
+ # Skip answers that are not fully in the context
267
+ if offsets[start_index] is None or offsets[end_index] is None:
268
+ continue
269
+ # Skip answers with a length that is either < 0 or > max_answer_length
270
+ if (
271
+ end_index < start_index
272
+ or end_index - start_index + 1 > max_answer_length
273
+ ):
274
+ continue
275
+
276
+ answer = {
277
+ "text": context[offsets[start_index][0] : offsets[end_index][1]],
278
+ "logit_score": start_logit[start_index] + end_logit[end_index],
279
+ }
280
+ answers.append(answer)
281
+
282
+ # Select the answer with the best score
283
+ if len(answers) > 0:
284
+ best_answer = max(answers, key=lambda x: x["logit_score"])
285
+ predicted_answers.append(
286
+ {"id": example_id, "prediction_text": best_answer["text"]}
287
+ )
288
+ else:
289
+ predicted_answers.append({"id": example_id, "prediction_text": ""})
290
+
291
+ theoretical_answers = [{"id": ex["id"], "answers": ex["answers"]} for ex in examples]
292
+ return metric.compute(predictions=predicted_answers, references=theoretical_answers)
293
+
294
+ def preprocess_training_examples(examples):
295
+
296
+ questions = [q.strip() for q in examples["question"]]
297
+ inputs = tokenizer(
298
+ questions,
299
+ examples["context"],
300
+ max_length=max_length,
301
+ truncation="only_second",
302
+ stride=stride,
303
+ return_overflowing_tokens=True,
304
+ return_offsets_mapping=True,
305
+ padding="max_length",
306
+ )
307
+
308
+ offset_mapping = inputs.pop("offset_mapping")
309
+ sample_map = inputs.pop("overflow_to_sample_mapping")
310
+ answers = examples["answers"]
311
+ start_positions = []
312
+ end_positions = []
313
+
314
+ for i, offset in enumerate(offset_mapping):
315
+ sample_idx = sample_map[i]
316
+ answer = answers[sample_idx]
317
+ start_char = answer["answer_start"][0]
318
+ end_char = answer["answer_start"][0] + len(answer["text"][0])
319
+ sequence_ids = inputs.sequence_ids(i)
320
+
321
+ # Find the start and end of the context
322
+ idx = 0
323
+ while sequence_ids[idx] != 1:
324
+ idx += 1
325
+ context_start = idx
326
+ while sequence_ids[idx] == 1:
327
+ idx += 1
328
+ context_end = idx - 1
329
+
330
+ # If the answer is not fully inside the context, label is (0, 0)
331
+ if offset[context_start][0] > start_char or offset[context_end][1] < end_char:
332
+ start_positions.append(0)
333
+ end_positions.append(0)
334
+ else:
335
+ # Otherwise it's the start and end token positions
336
+ idx = context_start
337
+ while idx <= context_end and offset[idx][0] <= start_char:
338
+ idx += 1
339
+ start_positions.append(idx - 1)
340
+
341
+ idx = context_end
342
+ while idx >= context_start and offset[idx][1] >= end_char:
343
+ idx -= 1
344
+ end_positions.append(idx + 1)
345
+
346
+ inputs["start_positions"] = start_positions
347
+ inputs["end_positions"] = end_positions
348
+ return inputs
349
+
350
+ def preprocess_validation_examples(examples):
351
+ questions = [q.strip() for q in examples["question"]]
352
+ inputs = tokenizer(
353
+ questions,
354
+ examples["context"],
355
+ max_length=max_length,
356
+ truncation="only_second",
357
+ stride=stride,
358
+ return_overflowing_tokens=True,
359
+ return_offsets_mapping=True,
360
+ padding="max_length",
361
+ )
362
+
363
+
364
+ sample_map = inputs.pop("overflow_to_sample_mapping")
365
+ example_ids = []
366
+
367
+ for i in range(len(inputs["input_ids"])):
368
+ sample_idx = sample_map[i]
369
+ example_ids.append(examples["id"][sample_idx])
370
+
371
+ sequence_ids = inputs.sequence_ids(i)
372
+ offset = inputs["offset_mapping"][i]
373
+ inputs["offset_mapping"][i] = [
374
+ o if sequence_ids[k] == 1 else None for k, o in enumerate(offset)
375
+ ]
376
+
377
+ inputs["example_id"] = example_ids
378
+ return inputs
379
+
380
+ if train:
381
+
382
+ model = AutoModelForQuestionAnswering.from_pretrained(base_model).to(device)
383
+ model.train()
384
+
385
+ if squad_shift:
386
+ raw_train = get_squadshifts_training()
387
+
388
+ train_dataset = raw_train.map(
389
+ preprocess_training_examples,
390
+ batched=True,
391
+ remove_columns=raw_train.column_names,
392
+ )
393
+
394
+ validation_dataset = raw_validation.map(
395
+ preprocess_validation_examples,
396
+ batched=True,
397
+ remove_columns=raw_validation.column_names,
398
+ )
399
+
400
+ # --------------- PEFT -------------------- # One epoch without PEFT took about 2h on my computer with CUDA - performance of PEFT kinda ass though
401
+ if PEFT:
402
+ from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
403
+
404
+ # ---- For all linear layers ----
405
+ import re
406
+ pattern = r'\((\w+)\): Linear'
407
+ linear_layers = re.findall(pattern, str(model.modules))
408
+ target_modules = list(set(linear_layers))
409
+
410
+ # If using peft, can consider increaisng r for better performance
411
+ peft_config = LoraConfig(
412
+ task_type=TaskType.QUESTION_ANS, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1, target_modules=target_modules, bias='all'
413
+ )
414
+
415
+ model = get_peft_model(model, peft_config)
416
+ model.print_trainable_parameters()
417
+
418
+ trained_model += "_PEFT"
419
+
420
+ # ------------------------------------------ #
421
+
422
+ args = TrainingArguments(
423
+ trained_model,
424
+ evaluation_strategy = "no",
425
+ save_strategy="epoch",
426
+ learning_rate=2e-5,
427
+ per_device_train_batch_size=batch_size,
428
+ per_device_eval_batch_size=batch_size,
429
+ num_train_epochs=3,
430
+ weight_decay=0.01,
431
+ push_to_hub=True,
432
+ fp16=fp16
433
+ )
434
+
435
+ trainer = Trainer(
436
+ model,
437
+ args,
438
+ train_dataset=train_dataset,
439
+ eval_dataset=validation_dataset,
440
+ data_collator=default_data_collator,
441
+ tokenizer=tokenizer
442
+ )
443
+
444
+ trainer.train(train_checkpoint)
445
+
446
+ if test:
447
+ out = "out.txt"
448
+ for j in range(1):
449
+ # model = AutoModelForQuestionAnswering.from_pretrained(model_list[j]).to(device)
450
+ # tokenizer = AutoTokenizer.from_pretrained(tokenizer_list[j])
451
+ # Normal case
452
+ # test_validation = raw_validation
453
+ for question in question_list:
454
+ model_name = "botcon/XLNET_squad_finetuned_large"
455
+ model = AutoModelForQuestionAnswering.from_pretrained(model_name).to(device)
456
+ model.eval()
457
+ tokenizer = AutoTokenizer.from_pretrained(base_tokenizer)
458
+ test_validation = get_dataset(raw_validation, question)
459
+ exact_match = 0
460
+ f1 = 0
461
+ validation_size = 50
462
+ start = 0
463
+ end = validation_size
464
+
465
+ with torch.no_grad():
466
+ while start < len(test_validation):
467
+ small_eval_set = test_validation.select(range(start, min(end, len(test_validation))))
468
+ eval_set = small_eval_set.map(
469
+ preprocess_validation_examples,
470
+ batched=True,
471
+ remove_columns=test_validation.column_names
472
+ )
473
+ eval_set_for_model = eval_set.remove_columns(["example_id", "offset_mapping"])
474
+ eval_set_for_model.set_format("torch")
475
+ batch = {k: eval_set_for_model[k].to(device) for k in eval_set_for_model.column_names}
476
+ outputs = model(**batch)
477
+ start_logits = outputs.start_logits.cpu().numpy()
478
+ end_logits = outputs.end_logits.cpu().numpy()
479
+ res = compute_metrics(start_logits, end_logits, eval_set, small_eval_set)
480
+ exact_match += res['exact_match'] * (len(small_eval_set) / len(test_validation))
481
+ f1 += res["f1"] * (len(small_eval_set) / len(test_validation))
482
+ start += validation_size
483
+ end += validation_size
484
+
485
+ print("F1 score: {}".format(f1))
486
+ print("Exact match: {}".format(exact_match))
487
+ with open(out, "a+") as file:
488
+ file.write("Model: {}, Question: {}, Size: {}".format(model_name, question, len(test_validation)))
489
+ file.write("\n")
490
+ file.write("F1 score: {}".format(f1))
491
+ file.write("\n")
492
+ file.write("Exact match: {}".format(exact_match))
493
+ file.write("\n")
README.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Main Script: QuestionAnswering.py
2
+
3
+ The script uses HuggingFace library for managing the datasets, importing/exporting models and training the models.
4
+
5
+ There are various variables at the start of the script.
6
+ - train: Training a new model
7
+ - PEFT: Whether to use PEFT during training
8
+ - tf32/fp16: Mixed precision training choice
9
+ - trained_model: Name of trained model (to be pushed to HF Hub)
10
+ - train_checkpoint: Checkpoint of training (None by default)
11
+ - squad_shift: Whether to include extra data (squadshift)
12
+ - base_tokenizer: Tokenizer of base model
13
+ - base_model: Pre-trained model
14
+
15
+ - test: Testing a model
16
+ - tokenizer_list/model_list/question_list: Which tokenizer, model and questions to be tested.
17
+
18
+ CUDA is enabled if applicable.
19
+ Require user to login into HuggingFace Hub (via command line token or through script) if training. Alternative is to not push to hub, a local repository will be created.
20
+
21
+ Huggingface repositories created (models created)
22
+ - botcon/XLNET_squad_finetuned_large
23
+ - botcon/XLNET_squadshift_finetuned_large
24
+ - botcon/LUKE_squad_finetuned_large
25
+ - botcon/LUKE_squadshift_finetuned_large
26
+ - botcon/LUKE_squad_what