2024-08-30 20:25:26.549777: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-08-30 20:25:26.568217: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-08-30 20:25:26.590253: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-08-30 20:25:26.597224: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-08-30 20:25:26.612962: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-08-30 20:25:27.916342: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.10/dist-packages/transformers/training_args.py:1494: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( 08/30/2024 20:25:29 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 08/30/2024 20:25:29 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, batch_eval_metrics=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=epoch, evaluation_strategy=epoch, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=True, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/content/dissertation/scripts/ner/output/tb, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=f1, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=10.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/content/dissertation/scripts/ner/output, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=/content/dissertation/scripts/ner/output, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=500, save_strategy=epoch, save_total_limit=None, seed=42, skip_memory_metrics=True, split_batches=None, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) Downloading builder script: 0%| | 0.00/3.54k [00:00> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--PlanTL-GOB-ES--bsc-bio-ehr-es/snapshots/1e543adb2d21f19d85a89305eebdbd64ab656b99/config.json [INFO|configuration_utils.py:800] 2024-08-30 20:25:41,923 >> Model config RobertaConfig { "_name_or_path": "PlanTL-GOB-ES/bsc-bio-ehr-es", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "finetuning_task": "ner", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "O", "1": "B-ENFERMEDAD", "2": "I-ENFERMEDAD" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "B-ENFERMEDAD": 1, "I-ENFERMEDAD": 2, "O": 0 }, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.42.4", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50262 } [INFO|configuration_utils.py:733] 2024-08-30 20:25:42,016 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--PlanTL-GOB-ES--bsc-bio-ehr-es/snapshots/1e543adb2d21f19d85a89305eebdbd64ab656b99/config.json [INFO|configuration_utils.py:800] 2024-08-30 20:25:42,017 >> Model config RobertaConfig { "_name_or_path": "PlanTL-GOB-ES/bsc-bio-ehr-es", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.42.4", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50262 } [INFO|tokenization_utils_base.py:2161] 2024-08-30 20:25:42,027 >> loading file vocab.json from cache at /root/.cache/huggingface/hub/models--PlanTL-GOB-ES--bsc-bio-ehr-es/snapshots/1e543adb2d21f19d85a89305eebdbd64ab656b99/vocab.json [INFO|tokenization_utils_base.py:2161] 2024-08-30 20:25:42,028 >> loading file merges.txt from cache at /root/.cache/huggingface/hub/models--PlanTL-GOB-ES--bsc-bio-ehr-es/snapshots/1e543adb2d21f19d85a89305eebdbd64ab656b99/merges.txt [INFO|tokenization_utils_base.py:2161] 2024-08-30 20:25:42,028 >> loading file tokenizer.json from cache at None [INFO|tokenization_utils_base.py:2161] 2024-08-30 20:25:42,028 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2161] 2024-08-30 20:25:42,028 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--PlanTL-GOB-ES--bsc-bio-ehr-es/snapshots/1e543adb2d21f19d85a89305eebdbd64ab656b99/special_tokens_map.json [INFO|tokenization_utils_base.py:2161] 2024-08-30 20:25:42,028 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--PlanTL-GOB-ES--bsc-bio-ehr-es/snapshots/1e543adb2d21f19d85a89305eebdbd64ab656b99/tokenizer_config.json [INFO|configuration_utils.py:733] 2024-08-30 20:25:42,028 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--PlanTL-GOB-ES--bsc-bio-ehr-es/snapshots/1e543adb2d21f19d85a89305eebdbd64ab656b99/config.json [INFO|configuration_utils.py:800] 2024-08-30 20:25:42,029 >> Model config RobertaConfig { "_name_or_path": "PlanTL-GOB-ES/bsc-bio-ehr-es", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.42.4", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50262 } [INFO|configuration_utils.py:733] 2024-08-30 20:25:42,112 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--PlanTL-GOB-ES--bsc-bio-ehr-es/snapshots/1e543adb2d21f19d85a89305eebdbd64ab656b99/config.json [INFO|configuration_utils.py:800] 2024-08-30 20:25:42,113 >> Model config RobertaConfig { "_name_or_path": "PlanTL-GOB-ES/bsc-bio-ehr-es", "architectures": [ "RobertaForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "classifier_dropout": null, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.42.4", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50262 } [INFO|modeling_utils.py:3556] 2024-08-30 20:25:42,300 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--PlanTL-GOB-ES--bsc-bio-ehr-es/snapshots/1e543adb2d21f19d85a89305eebdbd64ab656b99/pytorch_model.bin [INFO|modeling_utils.py:4354] 2024-08-30 20:25:42,438 >> Some weights of the model checkpoint at PlanTL-GOB-ES/bsc-bio-ehr-es were not used when initializing RobertaForTokenClassification: ['lm_head.bias', 'lm_head.decoder.bias', 'lm_head.decoder.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight'] - This IS expected if you are initializing RobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing RobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:4366] 2024-08-30 20:25:42,438 >> Some weights of RobertaForTokenClassification were not initialized from the model checkpoint at PlanTL-GOB-ES/bsc-bio-ehr-es and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Map: 0%| | 0/27229 [00:00> The following columns in the training set don't have a corresponding argument in `RobertaForTokenClassification.forward` and have been ignored: tokens, ner_tags, id. If tokens, ner_tags, id are not expected by `RobertaForTokenClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2128] 2024-08-30 20:25:48,850 >> ***** Running training ***** [INFO|trainer.py:2129] 2024-08-30 20:25:48,850 >> Num examples = 27,229 [INFO|trainer.py:2130] 2024-08-30 20:25:48,850 >> Num Epochs = 10 [INFO|trainer.py:2131] 2024-08-30 20:25:48,850 >> Instantaneous batch size per device = 32 [INFO|trainer.py:2134] 2024-08-30 20:25:48,850 >> Total train batch size (w. parallel, distributed & accumulation) = 64 [INFO|trainer.py:2135] 2024-08-30 20:25:48,851 >> Gradient Accumulation steps = 2 [INFO|trainer.py:2136] 2024-08-30 20:25:48,851 >> Total optimization steps = 4,250 [INFO|trainer.py:2137] 2024-08-30 20:25:48,851 >> Number of trainable parameters = 124,055,043 0%| | 0/4250 [00:00> The following columns in the evaluation set don't have a corresponding argument in `RobertaForTokenClassification.forward` and have been ignored: tokens, ner_tags, id. If tokens, ner_tags, id are not expected by `RobertaForTokenClassification.forward`, you can safely ignore this message. [INFO|trainer.py:3788] 2024-08-30 20:27:31,192 >> ***** Running Evaluation ***** [INFO|trainer.py:3790] 2024-08-30 20:27:31,192 >> Num examples = 6810 [INFO|trainer.py:3793] 2024-08-30 20:27:31,192 >> Batch size = 8 0%| | 0/852 [00:00> Saving model checkpoint to /content/dissertation/scripts/ner/output/checkpoint-425 [INFO|configuration_utils.py:472] 2024-08-30 20:27:45,882 >> Configuration saved in /content/dissertation/scripts/ner/output/checkpoint-425/config.json [INFO|modeling_utils.py:2690] 2024-08-30 20:27:47,247 >> Model weights saved in /content/dissertation/scripts/ner/output/checkpoint-425/model.safetensors [INFO|tokenization_utils_base.py:2574] 2024-08-30 20:27:47,248 >> tokenizer config file saved in /content/dissertation/scripts/ner/output/checkpoint-425/tokenizer_config.json [INFO|tokenization_utils_base.py:2583] 2024-08-30 20:27:47,248 >> Special tokens file saved in /content/dissertation/scripts/ner/output/checkpoint-425/special_tokens_map.json [INFO|tokenization_utils_base.py:2574] 2024-08-30 20:27:50,017 >> tokenizer config file saved in /content/dissertation/scripts/ner/output/tokenizer_config.json [INFO|tokenization_utils_base.py:2583] 2024-08-30 20:27:50,017 >> Special tokens file saved in /content/dissertation/scripts/ner/output/special_tokens_map.json 10%|█ | 426/4250 [02:01<6:17:25, 5.92s/it] 10%|█ | 427/4250 [02:01<4:27:24, 4.20s/it] 10%|█ | 428/4250 [02:01<3:11:11, 3.00s/it] 10%|█ | 429/4250 [02:02<2:20:01, 2.20s/it] 10%|█ | 430/4250 [02:02<1:41:14, 1.59s/it] 10%|█ | 431/4250 [02:02<1:15:55, 1.19s/it] 10%|█ | 432/4250 [02:02<56:49, 1.12it/s] 10%|█ | 433/4250 [02:02<43:08, 1.47it/s] 10%|█ | 434/4250 [02:03<35:31, 1.79it/s] 10%|█ | 435/4250 [02:03<28:25, 2.24it/s] 10%|█ | 436/4250 [02:03<24:21, 2.61it/s] 10%|█ | 437/4250 [02:03<20:52, 3.04it/s] 10%|█ | 438/4250 [02:04<18:32, 3.43it/s] 10%|█ | 439/4250 [02:04<17:05, 3.72it/s] 10%|█ | 440/4250 [02:04<15:35, 4.07it/s] 10%|█ | 441/4250 [02:04<14:45, 4.30it/s] 10%|█ | 442/4250 [02:04<14:42, 4.32it/s] 10%|█ | 443/4250 [02:05<16:22, 3.88it/s] 10%|█ | 444/4250 [02:05<15:35, 4.07it/s] 10%|█ | 445/4250 [02:05<14:09, 4.48it/s] 10%|█ | 446/4250 [02:05<15:10, 4.18it/s] 11%|█ | 447/4250 [02:06<14:17, 4.43it/s] 11%|█ | 448/4250 [02:06<13:13, 4.79it/s] 11%|█ | 449/4250 [02:06<13:06, 4.83it/s] 11%|█ | 450/4250 [02:06<15:05, 4.20it/s] 11%|█ | 451/4250 [02:07<15:49, 4.00it/s] 11%|█ | 452/4250 [02:07<14:54, 4.25it/s] 11%|█ | 453/4250 [02:07<15:41, 4.03it/s] 11%|█ | 454/4250 [02:07<15:39, 4.04it/s] 11%|█ | 455/4250 [02:08<15:58, 3.96it/s] 11%|█ | 456/4250 [02:08<14:30, 4.36it/s] 11%|█ | 457/4250 [02:08<13:21, 4.73it/s] 11%|█ | 458/4250 [02:08<14:27, 4.37it/s] 11%|█ | 459/4250 [02:09<24:59, 2.53it/s] 11%|█ | 460/4250 [02:09<25:16, 2.50it/s] 11%|█ | 461/4250 [02:10<23:13, 2.72it/s] 11%|█ | 462/4250 [02:10<20:18, 3.11it/s] 11%|█ | 463/4250 [02:10<18:00, 3.51it/s] 11%|█ | 464/4250 [02:10<16:52, 3.74it/s] 11%|█ | 465/4250 [02:10<15:29, 4.07it/s] 11%|█ | 466/4250 [02:11<14:52, 4.24it/s] 11%|█ | 467/4250 [02:11<15:16, 4.13it/s] 11%|█ | 468/4250 [02:11<15:22, 4.10it/s] 11%|█ | 469/4250 [02:11<14:56, 4.22it/s] 11%|█ | 470/4250 [02:12<14:16, 4.41it/s] 11%|█ | 471/4250 [02:12<13:02, 4.83it/s] 11%|█ | 472/4250 [02:12<14:34, 4.32it/s] 11%|█ | 473/4250 [02:12<14:11, 4.43it/s] 11%|█ | 474/4250 [02:13<15:34, 4.04it/s] 11%|█ | 475/4250 [02:13<19:44, 3.19it/s] 11%|█ | 476/4250 [02:13<19:06, 3.29it/s] 11%|█ | 477/4250 [02:14<18:12, 3.45it/s] 11%|█ | 478/4250 [02:14<18:34, 3.39it/s] 11%|█▏ | 479/4250 [02:14<17:12, 3.65it/s] 11%|█▏ | 480/4250 [02:14<15:17, 4.11it/s] 11%|█▏ | 481/4250 [02:15<16:05, 3.90it/s] 11%|█▏ | 482/4250 [02:15<17:51, 3.52it/s] 11%|█▏ | 483/4250 [02:15<16:15, 3.86it/s] 11%|█▏ | 484/4250 [02:15<14:39, 4.28it/s] 11%|█▏ | 485/4250 [02:16<14:48, 4.24it/s] 11%|█▏ | 486/4250 [02:16<15:40, 4.00it/s] 11%|█▏ | 487/4250 [02:16<14:28, 4.33it/s] 11%|█▏ | 488/4250 [02:16<14:12, 4.41it/s] 12%|█▏ | 489/4250 [02:17<18:04, 3.47it/s] 12%|█▏ | 490/4250 [02:17<16:37, 3.77it/s] 12%|█▏ | 491/4250 [02:17<16:44, 3.74it/s] 12%|█▏ | 492/4250 [02:17<15:59, 3.92it/s] 12%|█▏ | 493/4250 [02:18<14:35, 4.29it/s] 12%|█▏ | 494/4250 [02:18<15:03, 4.16it/s] 12%|█▏ | 495/4250 [02:18<17:00, 3.68it/s] 12%|█▏ | 496/4250 [02:18<16:06, 3.89it/s] 12%|█▏ | 497/4250 [02:19<14:40, 4.26it/s]