07/13/2022 17:32:26 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False 07/13/2022 17:32:26 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=IntervalStrategy.NO, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_min_num_params=0, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=3e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=../results/phrase_sense_disambiguation//qa/allenai/longformer-base-4096-20K-2K/finetuned/runs/Jul13_17-32-24_gpu5, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=2.0, optim=OptimizerNames.ADAMW_HF, output_dir=../results/phrase_sense_disambiguation//qa/allenai/longformer-base-4096-20K-2K/finetuned, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=2, per_device_train_batch_size=2, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=../results/phrase_sense_disambiguation//qa/allenai/longformer-base-4096-20K-2K/finetuned, save_on_each_node=False, save_steps=100000, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_ipex=False, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xpu_backend=None, ) 07/13/2022 17:32:27 - INFO - datasets.builder - No config specified, defaulting to the single config: phrase_sense_disambiguation/PSD 07/13/2022 17:32:27 - INFO - datasets.builder - Overwrite dataset info from restored data version. 07/13/2022 17:32:27 - INFO - datasets.info - Loading Dataset info from /home/thang/.cache/huggingface/datasets/PiC___phrase_sense_disambiguation/PSD/1.0.0/b06fb909d3a87edce48db98df9113dfea60514936183b4d58e2fc89694cf0c0b 07/13/2022 17:32:27 - WARNING - datasets.builder - Reusing dataset phrase_sense_disambiguation (/home/thang/.cache/huggingface/datasets/PiC___phrase_sense_disambiguation/PSD/1.0.0/b06fb909d3a87edce48db98df9113dfea60514936183b4d58e2fc89694cf0c0b) 07/13/2022 17:32:27 - INFO - datasets.info - Loading Dataset info from /home/thang/.cache/huggingface/datasets/PiC___phrase_sense_disambiguation/PSD/1.0.0/b06fb909d3a87edce48db98df9113dfea60514936183b4d58e2fc89694cf0c0b 0%| | 0/3 [00:00> loading configuration file ../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned/config.json [INFO|configuration_utils.py:708] 2022-07-13 17:32:27,073 >> Model config LongformerConfig { "_name_or_path": "../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned", "architectures": [ "LongformerForQuestionAnswering" ], "attention_mode": "longformer", "attention_probs_dropout_prob": 0.1, "attention_window": [ 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512, 512 ], "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "ignore_attention_mask": false, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 4098, "model_type": "longformer", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "sep_token_id": 2, "torch_dtype": "float32", "transformers_version": "4.20.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 } [INFO|tokenization_utils_base.py:1701] 2022-07-13 17:32:27,074 >> Didn't find file ../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned/added_tokens.json. We won't load it. [INFO|tokenization_utils_base.py:1779] 2022-07-13 17:32:27,074 >> loading file ../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned/vocab.json [INFO|tokenization_utils_base.py:1779] 2022-07-13 17:32:27,074 >> loading file ../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned/merges.txt [INFO|tokenization_utils_base.py:1779] 2022-07-13 17:32:27,074 >> loading file ../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned/tokenizer.json [INFO|tokenization_utils_base.py:1779] 2022-07-13 17:32:27,074 >> loading file None [INFO|tokenization_utils_base.py:1779] 2022-07-13 17:32:27,074 >> loading file ../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned/special_tokens_map.json [INFO|tokenization_utils_base.py:1779] 2022-07-13 17:32:27,074 >> loading file ../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned/tokenizer_config.json [INFO|modeling_utils.py:2105] 2022-07-13 17:32:27,160 >> loading weights file ../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned/pytorch_model.bin [INFO|modeling_utils.py:2483] 2022-07-13 17:32:28,342 >> All model checkpoint weights were used when initializing LongformerForQuestionAnswering. [INFO|modeling_utils.py:2491] 2022-07-13 17:32:28,343 >> All the weights of LongformerForQuestionAnswering were initialized from the model checkpoint at ../results/phrase_retrieval/PR-pass/qa/allenai/longformer-base-4096/finetuned. If your task is similar to the task the model of the checkpoint was trained on, you can already use LongformerForQuestionAnswering for predictions without further training. 07/13/2022 17:32:28 - WARNING - datasets.fingerprint - Parameter 'function'=.prepare_train_features at 0x7f1ad548a280> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. Running tokenizer on train dataset: 0%| | 0/2 [00:00.prepare_validation_features at 0x7f1ad548a1f0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Running tokenizer on validation dataset: 0%| | 0/1 [00:00> ***** Running training ***** [INFO|trainer.py:1517] 2022-07-13 17:32:52,168 >> Num examples = 1650 [INFO|trainer.py:1518] 2022-07-13 17:32:52,168 >> Num Epochs = 2 [INFO|trainer.py:1519] 2022-07-13 17:32:52,168 >> Instantaneous batch size per device = 2 [INFO|trainer.py:1520] 2022-07-13 17:32:52,168 >> Total train batch size (w. parallel, distributed & accumulation) = 2 [INFO|trainer.py:1521] 2022-07-13 17:32:52,168 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1522] 2022-07-13 17:32:52,168 >> Total optimization steps = 1650 0%| | 0/1650 [00:00> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 893.6642, 'train_samples_per_second': 3.693, 'train_steps_per_second': 1.846, 'train_loss': 0.9066677579012784, 'epoch': 2.0} 100%|██████████| 1650/1650 [14:53<00:00, 1.84it/s] 100%|██████████| 1650/1650 [14:53<00:00, 1.85it/s] [INFO|trainer.py:2503] 2022-07-13 17:47:45,833 >> Saving model checkpoint to ../results/phrase_sense_disambiguation//qa/allenai/longformer-base-4096-20K-2K/finetuned [INFO|configuration_utils.py:446] 2022-07-13 17:47:45,834 >> Configuration saved in ../results/phrase_sense_disambiguation//qa/allenai/longformer-base-4096-20K-2K/finetuned/config.json [INFO|modeling_utils.py:1660] 2022-07-13 17:47:46,557 >> Model weights saved in ../results/phrase_sense_disambiguation//qa/allenai/longformer-base-4096-20K-2K/finetuned/pytorch_model.bin [INFO|tokenization_utils_base.py:2123] 2022-07-13 17:47:46,558 >> tokenizer config file saved in ../results/phrase_sense_disambiguation//qa/allenai/longformer-base-4096-20K-2K/finetuned/tokenizer_config.json [INFO|tokenization_utils_base.py:2130] 2022-07-13 17:47:46,558 >> Special tokens file saved in ../results/phrase_sense_disambiguation//qa/allenai/longformer-base-4096-20K-2K/finetuned/special_tokens_map.json ***** train metrics ***** epoch = 2.0 train_loss = 0.9067 train_runtime = 0:14:53.66 train_samples = 1650 train_samples_per_second = 3.693 train_steps_per_second = 1.846 07/13/2022 17:47:46 - INFO - __main__ - *** Evaluate *** [INFO|trainer.py:661] 2022-07-13 17:47:46,637 >> The following columns in the evaluation set don't have a corresponding argument in `LongformerForQuestionAnswering.forward` and have been ignored: offset_mapping, example_id. If offset_mapping, example_id are not expected by `LongformerForQuestionAnswering.forward`, you can safely ignore this message. [INFO|trainer.py:2753] 2022-07-13 17:47:46,639 >> ***** Running Evaluation ***** [INFO|trainer.py:2755] 2022-07-13 17:47:46,639 >> Num examples = 500 [INFO|trainer.py:2758] 2022-07-13 17:47:46,639 >> Batch size = 2 0%| | 0/250 [00:00> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Question Answering', 'type': 'question-answering'}, 'dataset': {'name': 'PiC/phrase_sense_disambiguation ', 'type': 'PiC/phrase_sense_disambiguation', 'args': ''}}