Core dumps in onnxruntime

by narugo - opened May 14, 2024

May 14, 2024

when i use this model in onnxruntime, like this

import numpy as np
import onnxruntime
from huggingface_hub import hf_hub_download

session = onnxruntime.InferenceSession(hf_hub_download(
    repo_id='SmilingWolf/wd-v1-4-swinv2-tagger-v2',
    repo_type="model",
    filename="model.onnx"
), providers=['CUDAExecutionProvider'])
input_name = session.get_inputs()[0].name
input_ = np.random.randn(1, 448, 448, 3).astype(np.float32)
print(session.run([], {input_name: input_}))

then core dumped.

when i use cpu provider, this error is gone.

and luckily, the all other taggers (including v2 and v3) works well on cuda provider, only this one has error when running on cuda.

so many some error when dumping this model to onnx format?

i can reproduce this error on the following different running environments:

ubuntu20.04, python3.8.18, onnxruntime-gpu 1.17.1, 2060gpu
ubuntu20.04, python3.10.14, onnxruntime-gpu 1.17.1, A100 80g gpu

SmilingWolf

Owner May 14, 2024

•

edited May 14, 2024

It's 100% an ONNXRuntime 1.17.1 bug. They supposedly released v1.17.3 to address the issue (among others), but somehow managed to fail to publish it on PyPI.
For three weeks. With an open issue about the missing release. Nice.

I have tested v1.17.1: it core dumps.
I have tested ort_nightly_gpu-1.17.3.dev20240409002: it works
I have tested ort_nightly_gpu-1.18.0.dev20240430005: it works
I have tested ort_nightly_gpu-1.19.0.dev20240513003: it works

Supposedly they are about to release v1.18, if they don't fuck it up midway it should work.
If a downgrade is possible, I think I remember v1.16.3 worked alright for v2 models. I can't easily test that on Arch right now though.
It will not be compatible with v3 models, of course.
Your best option might be to download the wheels from the links above and hope for a fast release by MS. Sorry for the inconvenience.

All tests run on an up-to-date Arch Linux machine, python 3.12.3, RTX 4090 GPU.

SmilingWolf

Owner May 15, 2024

•

edited May 15, 2024

Hey, so I found out from your post in the other thread you're the DeepGHS guy. I'm a fan of your work!

I went and reexported the model. I also ported it to TIMM and the JAX codebase, for good measure.
It doesn't crash anymore on my machine, if you test it please report back.

narugo

May 16, 2024

I went and reexported the model. I also ported it to TIMM and the JAX codebase, for good measure.
It doesn't crash anymore on my machine, if you test it please report back.

runnable on onnxruntime-gpu 1.17.1 now, thank u for your excellent works and fixing

Hey, so I found out from your post in the other thread you're the DeepGHS guy. I'm a fan of your work!

im working on anime waifus ~~because they are so hot, lol~~

narugo changed discussion status to closed May 16, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment