Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

#103

by Mubarak127 - opened Jun 29, 2024

Jun 29, 2024

lora_r = 16
lora_alpha = 64
lora_dropout = 0.1
lora_target_modules = [
"q_proj",
"up_proj",
"o_proj",
"k_proj",
"down_proj",
"gate_proj",
"v_proj",
]

peft_config = LoraConfig(
r=lora_r,
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
target_modules=lora_target_modules,
bias="none",
task_type="CAUSAL_LM",
)

training_arguments = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
optim="paged_adamw_32bit",
logging_steps=1,
learning_rate=1e-4,
fp16=True,
max_grad_norm=0.3,
num_train_epochs=2,
evaluation_strategy="steps",
eval_steps=0.2,
warmup_ratio=0.05,
save_strategy="epoch",
group_by_length=True,
output_dir=OUTPUT_DIR,
report_to="tensorboard",
save_safetensors=True,
lr_scheduler_type="cosine",
seed=42,
)

trainer = SFTTrainer(
model=model,
train_dataset=data,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=4096,
tokenizer=tokenizer,
args=training_arguments,
)
trainer.train()

/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(

RuntimeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 trainer.train()

21 frames
~/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/898df1396f35e447d5fe44e0a3ccaaaa69f30d36/modeling_falcon.py in forward(self, query, key, past_key_values_length)
106 batch, seq_len, head_dim = query.shape
107 cos, sin = self.cos_sin(seq_len, past_key_values_length, query.device, query.dtype)
--> 108 return (query * cos) + (rotate_half(query) * sin), (key * cos) + (rotate_half(key) * sin)
109
110

RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

How do I resolve this issue ? Any help in debugging this is appreciated, thanks!

mostafataha1

Nov 20, 2024

did u find the solution ?

Mubarak127

Nov 20, 2024

Yes, I resolved the bug.

joshualxndrs

Dec 26, 2024

Sorry, what is the solution. Could you please share I encountered the same thing.

wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: ··········
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
Tracking run with wandb version 0.19.1
Run data is saved locally in /content/wandb/run-20241226_142554-eejhfw4v
Syncing run experiments to Weights & Biases (docs)
View project at https://wandb.ai/joshualxndrs-binus-university/huggingface
View run at https://wandb.ai/joshualxndrs-binus-university/huggingface/runs/eejhfw4v
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
return fn(*args, **kwargs)

RuntimeError Traceback (most recent call last)
in <cell line: 22>()
20 )
21 model.config.use_cache = False
---> 22 trainer.train()

18 frames
~/.cache/huggingface/modules/transformers_modules/vilsonrodrigues/falcon-7b-instruct-sharded/0e7ea20c0bfd0665eaf3835f1efd12a0e8f02d90/modeling_falcon.py in forward(self, query, key, past_key_values_length)
106 batch, seq_len, head_dim = query.shape
107 cos, sin = self.cos_sin(seq_len, past_key_values_length, query.device, query.dtype)
--> 108 return (query * cos) + (rotate_half(query) * sin), (key * cos) + (rotate_half(key) * sin)
109
110

RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment