huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) wandb: WARNING Serializing object of type dict that is 589920 bytes wandb: WARNING Serializing object of type dict that is 589920 bytes 0%| | 0/70340 [00:00 File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py", line 513, in main data_args.max_train_samples if data_args.max_train_samples is not None else len(train_dataset) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1409, in train return inner_training_loop( File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 1651, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2349, in training_step loss = self.compute_loss(model, inputs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py", line 2381, in compute_loss outputs = model(**inputs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/_utils.py", line 457, in reraise raise exception RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 459, in forward File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 247, in coil_forward lab_reps = self.tok_proj(outputs_lab.last_hidden_state @ self.label_projection.weight) # Q * LQ * d File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 399, in forward_label_embeddings desc_attention_mask: Optional[List[int]] = None, File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1018, in forward encoder_outputs = self.encoder( File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 607, in forward layer_outputs = layer_module( File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 493, in forward self_attention_outputs = self.attention( File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 423, in forward self_outputs = self.self( File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 355, in forward attention_probs = self.dropout(attention_probs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/dropout.py", line 58, in forward return F.dropout(input, self.p, self.training, self.inplace) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 1279, in dropout return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training) RuntimeError: CUDA out of memory. Tried to allocate 782.00 MiB (GPU 0; 10.76 GiB total capacity; 3.28 GiB already allocated; 61.69 MiB free; 3.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py:598 in  │ │ │ │ 595 │ main() │ │ │ │ /n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/main.py:513 in main │ │ │ │ 510 │ │ train_result = trainer.train(resume_from_checkpoint=checkpoint) │ │ 511 │ │ metrics = train_result.metrics │ │ 512 │ │ max_train_samples = ( │ │ ❱ 513 │ │ │ data_args.max_train_samples if data_args.max_train_samples is not None else │ │ 514 │ │ ) │ │ 515 │ │ metrics["train_samples"] = min(max_train_samples, len(train_dataset)) │ │ 516 │ │ │ │ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py:1409 in train │ │ │ │ 1406 │ │ inner_training_loop = find_executable_batch_size( │ │ 1407 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │ │ 1408 │ │ ) │ │ ❱ 1409 │ │ return inner_training_loop( │ │ 1410 │ │ │ args=args, │ │ 1411 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │ │ 1412 │ │ │ trial=trial, │ │ │ │ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py:1651 in │ │ _inner_training_loop │ │ │ │ 1648 │ │ │ │ │ with model.no_sync(): │ │ 1649 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │ │ 1650 │ │ │ │ else: │ │ ❱ 1651 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │ │ 1652 │ │ │ │ │ │ 1653 │ │ │ │ if ( │ │ 1654 │ │ │ │ │ args.logging_nan_inf_filter │ │ │ │ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py:2349 in │ │ training_step │ │ │ │ 2346 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │ │ 2347 │ │ │ │ 2348 │ │ with self.compute_loss_context_manager(): │ │ ❱ 2349 │ │ │ loss = self.compute_loss(model, inputs) │ │ 2350 │ │ │ │ 2351 │ │ if self.args.n_gpu > 1: │ │ 2352 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │ │ │ │ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/trainer.py:2381 in │ │ compute_loss │ │ │ │ 2378 │ │ │ labels = inputs.pop("labels") │ │ 2379 │ │ else: │ │ 2380 │ │ │ labels = None │ │ ❱ 2381 │ │ outputs = model(**inputs) │ │ 2382 │ │ # Save past state if it exists │ │ 2383 │ │ # TODO: this needs to be fixed and made cleaner later. │ │ 2384 │ │ if self.args.past_index >= 0: │ │ │ │ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1110 in │ │ _call_impl │ │ │ │ 1107 │ │ # this function, and just call forward. │ │ 1108 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1109 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1110 │ │ │ return forward_call(*input, **kwargs) │ │ 1111 │ │ # Do not call functions when jit is used │ │ 1112 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1113 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py:168 │ │ in forward │ │ │ │ 165 │ │ │ if len(self.device_ids) == 1: │ │ 166 │ │ │ │ return self.module(*inputs[0], **kwargs[0]) │ │ 167 │ │ │ replicas = self.replicate(self.module, self.device_ids[:len(inputs)]) │ │ ❱ 168 │ │ │ outputs = self.parallel_apply(replicas, inputs, kwargs) │ │ 169 │ │ │ return self.gather(outputs, self.output_device) │ │ 170 │ │ │ 171 │ def replicate(self, module, device_ids): │ │ │ │ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/data_parallel.py:178 │ │ in parallel_apply │ │ │ │ 175 │ │ return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim) │ │ 176 │ │ │ 177 │ def parallel_apply(self, replicas, inputs, kwargs): │ │ ❱ 178 │ │ return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) │ │ 179 │ │ │ 180 │ def gather(self, outputs, output_device): │ │ 181 │ │ return gather(outputs, output_device, dim=self.dim) │ │ │ │ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py:86 │ │ in parallel_apply │ │ │ │ 83 │ for i in range(len(inputs)): │ │ 84 │ │ output = results[i] │ │ 85 │ │ if isinstance(output, ExceptionWrapper): │ │ ❱ 86 │ │ │ output.reraise() │ │ 87 │ │ outputs.append(output) │ │ 88 │ return outputs │ │ 89 │ │ │ │ /n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/_utils.py:457 in reraise │ │ │ │ 454 │ │ │ # If the exception takes multiple arguments, don't try to │ │ 455 │ │ │ # instantiate since we don't know how to │ │ 456 │ │ │ raise RuntimeError(msg) from None │ │ ❱ 457 │ │ raise exception │ │ 458 │ │ 459 │ │ 460 def _get_available_device_type(): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Caught RuntimeError in replica 0 on device 0. Original Traceback (most recent call last): File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 459, in forward File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 247, in coil_forward lab_reps = self.tok_proj(outputs_lab.last_hidden_state @ self.label_projection.weight) # Q * LQ * d File "/n/fs/nlp-pranjal/SemSup-LMLC/cleaned_code/src/models.py", line 399, in forward_label_embeddings desc_attention_mask: Optional[List[int]] = None, File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 1018, in forward encoder_outputs = self.encoder( File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 607, in forward layer_outputs = layer_module( File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 493, in forward self_attention_outputs = self.attention( File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 423, in forward self_outputs = self.self( File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/transformers/models/bert/modeling_bert.py", line 355, in forward attention_probs = self.dropout(attention_probs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/modules/dropout.py", line 58, in forward return F.dropout(input, self.p, self.training, self.inplace) File "/n/fs/nlp-pranjal/miniconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 1279, in dropout return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training) RuntimeError: CUDA out of memory. Tried to allocate 782.00 MiB (GPU 0; 10.76 GiB total capacity; 3.28 GiB already allocated; 61.69 MiB free; 3.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF