Multi-GPU training fails when using device_map = "auto"
#23
by
aveer30
- opened
Hi, I get an error when finetuning the model using device_map = "auto". The issue looks similar to the 128k variant. The fix is also provided on the below discussion. Could any of you verify this and push a fix? Thanks
https://huggingface.co/microsoft/Phi-3-small-128k-instruct/discussions/19#6677dc5020ff491d382a0221
File "/opt/conda/lib/python3.10/site-packages/triton/runtime/jit.py", line 425, in run
kernel.run(grid_0, grid_1, grid_2, kernel.num_warps, kernel.num_ctas, # number of warps/ctas per instance
ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?)
Same error here for phi-3-small-8k