Update config.json

#5
by ybelkada - opened
No description provided.

The fix in transformers for the loss computation is going to be something like

    if isinstance(gate_logits, tuple):
        # cat along the layers?
        gate_logits = torch.cat([gate.cpu() for gate in gate_logits], dim=0)

we overlooked the devices when computing the loss

pstock changed pull request status to merged

@ArthurZ is this fixed in transformers? I am trying to fine tune with axolotl, but I get either

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0!

or when I change the config.json part like this:

  "output_router_logits": false,

I get :

RuntimeError: !grad_accumulator_.expired() INTERNAL ASSERT FAILED at "../torch/csrc/autograd/saved_variable.cpp":226, please report a bug to PyTorch. No grad accumulator for a saved leaf

Any hints?

No accelerate, just trying to run the training.

Sign up or log in to comment