Core dumps in onnxruntime

#6
by narugo - opened

when i use this model in onnxruntime, like this

import numpy as np
import onnxruntime
from huggingface_hub import hf_hub_download

session = onnxruntime.InferenceSession(hf_hub_download(
    repo_id='SmilingWolf/wd-v1-4-swinv2-tagger-v2',
    repo_type="model",
    filename="model.onnx"
), providers=['CUDAExecutionProvider'])
input_name = session.get_inputs()[0].name
input_ = np.random.randn(1, 448, 448, 3).astype(np.float32)
print(session.run([], {input_name: input_}))

then core dumped.

when i use cpu provider, this error is gone.

and luckily, the all other taggers (including v2 and v3) works well on cuda provider, only this one has error when running on cuda.

so many some error when dumping this model to onnx format?

i can reproduce this error on the following different running environments:

  • ubuntu20.04, python3.8.18, onnxruntime-gpu 1.17.1, 2060gpu
  • ubuntu20.04, python3.10.14, onnxruntime-gpu 1.17.1, A100 80g gpu

It's 100% an ONNXRuntime 1.17.1 bug. They supposedly released v1.17.3 to address the issue (among others), but somehow managed to fail to publish it on PyPI.
For three weeks. With an open issue about the missing release. Nice.

I have tested v1.17.1: it core dumps.
I have tested ort_nightly_gpu-1.17.3.dev20240409002: it works
I have tested ort_nightly_gpu-1.18.0.dev20240430005: it works
I have tested ort_nightly_gpu-1.19.0.dev20240513003: it works

Supposedly they are about to release v1.18, if they don't fuck it up midway it should work.
If a downgrade is possible, I think I remember v1.16.3 worked alright for v2 models. I can't easily test that on Arch right now though.
It will not be compatible with v3 models, of course.
Your best option might be to download the wheels from the links above and hope for a fast release by MS. Sorry for the inconvenience.

All tests run on an up-to-date Arch Linux machine, python 3.12.3, RTX 4090 GPU.

Hey, so I found out from your post in the other thread you're the DeepGHS guy. I'm a fan of your work!

I went and reexported the model. I also ported it to TIMM and the JAX codebase, for good measure.
It doesn't crash anymore on my machine, if you test it please report back.

I went and reexported the model. I also ported it to TIMM and the JAX codebase, for good measure.
It doesn't crash anymore on my machine, if you test it please report back.

runnable on onnxruntime-gpu 1.17.1 now, thank u for your excellent works and fixing

Hey, so I found out from your post in the other thread you're the DeepGHS guy. I'm a fan of your work!

im working on anime waifus because they are so hot, lol

narugo changed discussion status to closed

Sign up or log in to comment