A lot of <unk> generations in the cuda int 4 model.
#12
by
Satandon1999
- opened
I am using a code derived from https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-generate.py .
I have tried the cpu-int4 and the cuda-int4 models using the same data and code. Where the cpu model seems to be working fine, the cuda model is generating almost all of the tokens as 0 (which get decoded to ).
Versions:
onnxruntime-genai-cuda 0.3.0
torch 2.3.1+cu118
Is this being caused by some package version issues? Does anyone have any idea regarding this?
kvaishnavi
changed discussion status to
closed