Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

#103
by Mubarak127 - opened

lora_r = 16
lora_alpha = 64
lora_dropout = 0.1
lora_target_modules = [
"q_proj",
"up_proj",
"o_proj",
"k_proj",
"down_proj",
"gate_proj",
"v_proj",
]

peft_config = LoraConfig(
r=lora_r,
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
target_modules=lora_target_modules,
bias="none",
task_type="CAUSAL_LM",
)

training_arguments = TrainingArguments(
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
optim="paged_adamw_32bit",
logging_steps=1,
learning_rate=1e-4,
fp16=True,
max_grad_norm=0.3,
num_train_epochs=2,
evaluation_strategy="steps",
eval_steps=0.2,
warmup_ratio=0.05,
save_strategy="epoch",
group_by_length=True,
output_dir=OUTPUT_DIR,
report_to="tensorboard",
save_safetensors=True,
lr_scheduler_type="cosine",
seed=42,
)

trainer = SFTTrainer(
model=model,
train_dataset=data,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=4096,
tokenizer=tokenizer,
args=training_arguments,
)
trainer.train()

/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(

RuntimeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 trainer.train()

21 frames
~/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/898df1396f35e447d5fe44e0a3ccaaaa69f30d36/modeling_falcon.py in forward(self, query, key, past_key_values_length)
106 batch, seq_len, head_dim = query.shape
107 cos, sin = self.cos_sin(seq_len, past_key_values_length, query.device, query.dtype)
--> 108 return (query * cos) + (rotate_half(query) * sin), (key * cos) + (rotate_half(key) * sin)
109
110

RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

How do I resolve this issue ? Any help in debugging this is appreciated, thanks!

did u find the solution ?

Yes, I resolved the bug.

Sorry, what is the solution. Could you please share I encountered the same thing.

wandb: WARNING The run_name is currently set to the same value as TrainingArguments.output_dir. If this was not intended, please specify a different run name by setting the TrainingArguments.run_name parameter.
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 路路路路路路路路路路
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
Tracking run with wandb version 0.19.1
Run data is saved locally in /content/wandb/run-20241226_142554-eejhfw4v
Syncing run experiments to Weights & Biases (docs)
View project at https://wandb.ai/joshualxndrs-binus-university/huggingface
View run at https://wandb.ai/joshualxndrs-binus-university/huggingface/runs/eejhfw4v
/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
return fn(*args, **kwargs)

RuntimeError Traceback (most recent call last)
in <cell line: 22>()
20 )
21 model.config.use_cache = False
---> 22 trainer.train()

18 frames
~/.cache/huggingface/modules/transformers_modules/vilsonrodrigues/falcon-7b-instruct-sharded/0e7ea20c0bfd0665eaf3835f1efd12a0e8f02d90/modeling_falcon.py in forward(self, query, key, past_key_values_length)
106 batch, seq_len, head_dim = query.shape
107 cos, sin = self.cos_sin(seq_len, past_key_values_length, query.device, query.dtype)
--> 108 return (query * cos) + (rotate_half(query) * sin), (key * cos) + (rotate_half(key) * sin)
109
110

RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

Sign up or log in to comment