fine tuning with axolotl not working
#4
by
joorei
- opened
I am trying to fine tune with axolotl (using axolotl's docker), but I get either
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0!
or when I change the config.json part like this:
"output_router_logits": false,
(as hinted by https://huggingface.co/mistralai/Mixtral-8x7B-v0.1/discussions/5 )
I get :
RuntimeError: !grad_accumulator_.expired() INTERNAL ASSERT FAILED at "../torch/csrc/autograd/saved_variable.cpp":226, please report a bug to PyTorch. No grad accumulator for a saved leaf
Any hints?
No accelerate, just trying to run the training straight through python.
It worked with 2.5. I diffed 2.5 and 2.7 config.json and output_router_logits (and transformers version) is the only difference.