Update config.json
#5
by
ybelkada
- opened
No description provided.
The fix in transformers for the loss computation is going to be something like
if isinstance(gate_logits, tuple):
# cat along the layers?
gate_logits = torch.cat([gate.cpu() for gate in gate_logits], dim=0)
we overlooked the devices when computing the loss
pstock
changed pull request status to
merged
@ArthurZ is this fixed in transformers? I am trying to fine tune with axolotl, but I get either
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0!
or when I change the config.json part like this:
"output_router_logits": false,
I get :
RuntimeError: !grad_accumulator_.expired() INTERNAL ASSERT FAILED at "../torch/csrc/autograd/saved_variable.cpp":226, please report a bug to PyTorch. No grad accumulator for a saved leaf
Any hints?
No accelerate, just trying to run the training.