the model is not optimize in term of inference

by Imran1 - opened Oct 17

Imran1

Oct 17

i load the model for inference using 2H100 gpu. but the model is very slow with Flash Attention.

Oct 17

Can you share your code?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment