Model not working with accelerate for inference.
#25
by
Satandon1999
- opened
Trying to do multi-gpu inference using the accelerate library following the instructions mentioned here: https://medium.com/@geronimo7/llms-multi-gpu-inference-with-accelerate-5a8333e4c5db.
The process works perfectly fine for mini models. But with this small model I am facing the following error:
e90707fca55744aebfc511579dbd663c00000C:353:385 [1] NCCL INFO [Service thread] Connection closed by localRank 1
sh: 1: cannot create 0.1/compile-ptx-log-7f1750: Directory nonexistent
SystemLog: Traceback (most recent call last):
SystemLog: File "/mnt/azureml/cr/j/c3a3f3df23864dbdbb10a3f2d941acb8/exe/wd/run.py", line 141, in main
SystemLog: output = model.generate(input_ids=input_ids,
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
SystemLog: return func(*args, **kwargs)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate
SystemLog: result = self._sample(
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/transformers/generation/utils.py", line 2397, in _sample
SystemLog: outputs = self(
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
SystemLog: return self._call_impl(*args, **kwargs)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
SystemLog: return forward_call(*args, **kwargs)
SystemLog: File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 956, in forward
SystemLog: outputs = self.model(
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
SystemLog: return self._call_impl(*args, **kwargs)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
SystemLog: return forward_call(*args, **kwargs)
SystemLog: File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 859, in forward
SystemLog: layer_outputs = decoder_layer(
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
SystemLog: return self._call_impl(*args, **kwargs)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
SystemLog: return forward_call(*args, **kwargs)
SystemLog: File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 671, in forward
SystemLog: hidden_states, self_attn_weights, present_key_values = self.self_attn(
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
SystemLog: return self._call_impl(*args, **kwargs)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
SystemLog: return forward_call(*args, **kwargs)
SystemLog: File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 616, in forward
SystemLog: attn_function_output = self._apply_blocksparse_attention(
SystemLog: File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 382, in _apply_blocksparse_attention
SystemLog: context_layer = self._blocksparse_layer(
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
SystemLog: return self._call_impl(*args, **kwargs)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
SystemLog: return forward_call(*args, **kwargs)
SystemLog: File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/triton_blocksparse_attention_layer.py", line 165, in forward
SystemLog: return blocksparse_flash_attn_padded_fwd(
SystemLog: File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/triton_flash_blocksparse_attn.py", line 996, in blocksparse_flash_attn_padded_fwd
SystemLog: _fwd_kernel_batch_inference[grid](
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in <lambda>
SystemLog: return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 305, in run
SystemLog: return self.fn.run(*args, **kwargs)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
SystemLog: self.cache[device][key] = compile(
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/compiler/compiler.py", line 193, in compile
SystemLog: next_module = compile_ir(module, metadata)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 201, in <lambda>
SystemLog: stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
SystemLog: File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 194, in make_cubin
SystemLog: return compile_ptx_to_cubin(src, ptxas, capability, opt.enable_fp_fusion)
SystemLog: RuntimeError: `ptxas` failed with error code 2:
SystemLog:
SystemLog:ERROR:__main__:An error occurred during execution
Traceback (most recent call last):
File "/mnt/azureml/cr/j/c3a3f3df23864dbdbb10a3f2d941acb8/exe/wd/run.py", line 201, in <module>
main()
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/shrike/compliant_logging/exceptions.py", line 411, in wrapper
print_prefixed_stack_trace_and_raise(
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/shrike/compliant_logging/exceptions.py", line 366, in print_prefixed_stack_trace_and_raise
raise scrubbed_err # type: ignore
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/shrike/compliant_logging/exceptions.py", line 406, in wrapper
return function(*func_args, **func_kwargs)
File "/mnt/azureml/cr/j/c3a3f3df23864dbdbb10a3f2d941acb8/exe/wd/run.py", line 141, in main
output = model.generate(input_ids=input_ids,
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/transformers/generation/utils.py", line 2397, in _sample
outputs = self(
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 956, in forward
outputs = self.model(
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 859, in forward
layer_outputs = decoder_layer(
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 671, in forward
hidden_states, self_attn_weights, present_key_values = self.self_attn(
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 616, in forward
attn_function_output = self._apply_blocksparse_attention(
File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/modeling_phi3_small.py", line 382, in _apply_blocksparse_attention
context_layer = self._blocksparse_layer(
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/triton_blocksparse_attention_layer.py", line 165, in forward
return blocksparse_flash_attn_padded_fwd(
File "/root/.cache/huggingface/modules/transformers_modules/microsoft/Phi-3-small-128k-instruct/351b4fdafe349962997fd94996349824a7cd0214/triton_flash_blocksparse_attn.py", line 996, in blocksparse_flash_attn_padded_fwd
_fwd_kernel_batch_inference[grid](
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in <lambda>
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 305, in run
return self.fn.run(*args, **kwargs)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
self.cache[device][key] = compile(
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/compiler/compiler.py", line 193, in compile
next_module = compile_ir(module, metadata)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 201, in <lambda>
stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
File "/azureml-envs/azureml_50694e7d12e9be98761297f3c3adb59f/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 194, in make_cubin
return compile_ptx_to_cubin(src, ptxas, capability, opt.enable_fp_fusion)
RuntimeError: `ptxas` failed with error code 2:
Resolved by creating the directory called "0.1" as mentioned in the error line: sh: 1: cannot create 0.1/compile-ptx-log-7f1750: Directory nonexistent
.
Satandon1999
changed discussion status to
closed