Error occurred while training the model with tensors with no grads
Hello Mistral community!
I'm trying to fine-tune the model without quantization, but when I get to the training phase, I encounter this error and haven't been able to resolve it. I've attempted various approaches regarding the input data in SFTTrainer β both using formatting functions and without, just with a text field. I've also experimented with different parameters, but I'm not sure what the issue is. Could someone please assist me?
Error Message:
```
RuntimeError Traceback (most recent call last)
in <cell line: 1>()
1 with ClearCache():
----> 2 trainer.train()
3 trainer.save_model(model_complete_name)
5 frames
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1495 # Disable progress bars when uploading models during checkpoints to avoid polluting stdout
1496 hf_hub_utils.disable_progress_bars()
-> 1497 return inner_training_loop(
1498 args=args,
1499 resume_from_checkpoint=resume_from_checkpoint,
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1799
1800 with self.accelerator.accumulate(model):
-> 1801 tr_loss_step = self.training_step(model, inputs)
1802
1803 if (
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in training_step(self, model, inputs)
2657 scaled_loss.backward()
2658 else:
-> 2659 self.accelerator.backward(loss)
2660
2661 return loss.detach() / self.args.gradient_accumulation_steps
/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py in backward(self, loss, **kwargs)
1982 return
1983 elif self.scaler is not None:
-> 1984 self.scaler.scale(loss).backward(**kwargs)
1985 else:
1986 loss.backward(**kwargs)
/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
485 inputs=inputs,
486 )
--> 487 torch.autograd.backward(
488 self, gradient, retain_graph, create_graph, inputs=inputs
489 )
/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
198 # some Python versions print out the first line of a multi-line function
199 # calls in the traceback and some print out the last line
--> 200 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
201 tensors, grad_tensors, retain_graph, create_graph, inputs,
202 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
My code:
model_id = 'mistralai/Mistral-7B-Instruct-v0.1'
model = AutoModelForCausalLM.from_pretrained(model_id, use_cache=False)
model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
peft_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.1,
bias="none",
task_type="CAUSAL_LM",
target_modules=["q_proj",
"k_proj",
"v_proj"
"o_proj"],
)
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)
max_seq_length = 2048
trainer = SFTTrainer(
model=model,
train_dataset=data.with_format("torch"),
peft_config=peft_config,
max_seq_length=max_seq_length,
tokenizer=tokenizer,
packing=True,
formatting_func=format_instruction,
args=TrainingArguments(
output_dir=model_complete_name,
num_train_epochs=training_arguments_mistral['num_train_epochs'],
per_device_train_batch_size=training_arguments_mistral['per_device_train_batch_size'],
gradient_accumulation_steps=training_arguments_mistral['gradient_accumulation_steps'],
gradient_checkpointing=training_arguments_mistral['gradient_checkpointing'],
optim=training_arguments_mistral['optim'],
lr_scheduler_type=training_arguments_mistral['lr_scheduler_type'],
logging_steps=training_arguments_mistral['logging_steps'],
save_strategy=training_arguments_mistral['save_strategy'],
save_total_limit=training_arguments_mistral['save_total_limit'],
learning_rate=training_arguments_mistral['learning_rate'],
fp16=training_arguments_mistral['fp16'],
max_steps=training_arguments_mistral['max_steps'],
max_grad_norm=training_arguments_mistral['max_grad_norm'],
warmup_ratio=training_arguments_mistral['warmup_ratio'],
disable_tqdm=training_arguments_mistral['disable_tqdm'],
weight_decay=training_arguments_mistral['weight_decay'],
hub_model_id=training_arguments_mistral['hub_model_id'],
push_to_hub=training_arguments_mistral['push_to_hub'],
hub_strategy=training_arguments_mistral['hub_strategy'],
hub_always_push=training_arguments_mistral['hub_always_push'],
hub_token=training_arguments_mistral['hub_token'],
hub_private_repo=training_arguments_mistral['hub_private_repo']
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
trainer.train()
trainer.save_model(model_complete_name)
I also tried loading the model like this:
model = AutoModelForCausalLM.from_pretrained(model_id,
use_cache=False,
device_map="auto",
torch_dtype=torch.float16)
model.config.pretraining_tp = 1
My dataset format:
Dataset({
features: ['column1', 'column2', 'column3', 'column4', 'column5'],
num_rows: 2000
})
Note: when I run this code with quantization, it works fine, but I'm unsure which parameter to adjust or what configuration to make. I've also tried loading the model without the PEFT configuration
Update
The issue was resolved with the use of this configuration after applying Peft transformations:
model.gradient_checkpointing_enable()
model.enable_input_require_grads()