NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch.
Sorry for the repetitiveness.
I'm reporting an unusual error.
By the way, the error itself has been there since before the last PR, so that fix should not be the culprit.
It is working in someone else's space, so I hope it is simply lack of VRAM...
I can't decide whether it's a problem with my code, a conflict with some library, a version-dependent error in HF's library or pytorch, one of the bugs that are happening across the HF site, or a model error that occurs under certain conditions.
Note that normal inference and various other processes by FluxPipeline are executed without error, and errors occur only when the inference part of the code block below is reached.
Error Log
NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch.
Traceback (most recent call last):
File "/home/user/app/app.py", line 142, in generate_image
image = pipe(
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 711, in __call__
) = self.encode_prompt(
File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 370, in encode_prompt
prompt_embeds = self._get_t5_prompt_embeds(
File "/usr/local/lib/python3.10/site-packages/diffusers/pipelines/flux/pipeline_flux_controlnet.py", line 256, in _get_t5_prompt_embeds
prompt_embeds = self.text_encoder_2(text_input_ids.to(device), output_hidden_states=False)[0]
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1971, in forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1106, in forward
layer_outputs = layer_module(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 686, in forward
self_attention_outputs = self.layer[0](
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 593, in forward
attention_output = self.SelfAttention(
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 553, in forward
attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 253, in thread_wrapper
res = future.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/app/app.py", line 156, in generate_image
raise gr.Error(f"Inference Error: {e}")
gradio.exceptions.Error: 'Inference Error: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch. '
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 285, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1508, in call_function
prediction = await anyio.to_thread.run_sync( # type: ignore
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 818, in wrapper
response = f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 818, in wrapper
response = f(*args, **kwargs)
File "/home/user/app/app.py", line 201, in run_lora
image = generate_image(prompt_mash, steps, seed, cfg_scale, width, height, lora_scale, cn_on, progress)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 214, in gradio_handler
raise res.value
gradio.exceptions.Error: 'Inference Error: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":838, please report a bug to PyTorch. '
Code Summary
- Loading Part (Summarized code)
global controlnet_union # Originally located elsewhere
global controlnet # Originally located elsewhere
global pipe # Originally located elsewhere
repo_id = "camenduru/FLUX.1-dev-diffusers" # Originally located elsewhere, but this value is assigned
controlnet_model_union_repo = 'InstantX/FLUX.1-dev-Controlnet-Union' # Originally located elsewhere, but this value is assigned
dtype = torch.bfloat16 # Originally located elsewhere, but this value is assigned
controlnet_union = FluxControlNetModel.from_pretrained(controlnet_model_union_repo, torch_dtype=dtype)
controlnet = FluxMultiControlNetModel([controlnet_union])
pipe = FluxControlNetPipeline.from_pretrained(repo_id, controlnet=controlnet, torch_dtype=dtype)
- Inference Part (Actual code)
if controlnet is not None: controlnet.to("cuda") # Crashes here.
if controlnet_union is not None: controlnet_union.to("cuda") # Crashes here.
image = pipe( # Without the above statement, it would crash here.
prompt=prompt_mash,
control_image=images,
control_mode=modes,
num_inference_steps=steps,
guidance_scale=cfg_scale,
width=width,
height=height,
controlnet_conditioning_scale=scales,
generator=generator,
joint_attention_kwargs={"scale": lora_scale},
).images[0]
Dependency
spaces
git+https://github.com/huggingface/diffusers
torch
torchvision
huggingface_hub
accelerate
transformers
peft
sentencepiece
timm
einops
controlnet-aux
kornia
numpy
opencv-python
deepspeed
Actual Space
https://huggingface.co/spaces/John6666/flux-lora-the-explorer
Excuse, I met similar problem. Did you solve it?
Unfortunately, I don't think it's been resolved yet. I abandoned the function itself, so it will take some time to verify.π
Thanks:(