A lot of <unk> generations in the cuda int 4 model.

#12

by Satandon1999 - opened Jul 12, 2024

Satandon1999

Jul 12, 2024

I am using a code derived from https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/model-generate.py .

I have tried the cpu-int4 and the cuda-int4 models using the same data and code. Where the cpu model seems to be working fine, the cuda model is generating almost all of the tokens as 0 (which get decoded to ).

Versions:
onnxruntime-genai-cuda 0.3.0
torch 2.3.1+cu118

Is this being caused by some package version issues? Does anyone have any idea regarding this?

kvaishnavi

Microsoft org Jul 17, 2024

Can you try using the Phi-3 example scripts such as this one? If you want to use the model-generate.py example, the chat template is also necessary for Phi-3. An example of the chat template for Phi-3 mini can be found here.

kvaishnavi changed discussion status to closed Aug 5, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment