how to run inference in fp8?

#38
by codewithRiz - opened

we have tested on A100 as well and rtx 4070 locally
with flask API. now the on avg 1min video taking 1.5 min infrance time even I tried to upload the video on server already to put just video id so that uploading time reduces .i tiread torch compile also , not sure how to increase inference time .
any tips

Sign up or log in to comment