the model is not optimize in term of inference

#7
by Imran1 - opened

i load the model for inference using 2H100 gpu. but the model is very slow with Flash Attention.

Can you share your code?

Sign up or log in to comment