Caching doesn't work on multi gpu

#23
by eastwind - opened

I get gibberish if caching is enabled when inferencing over multigpu

@eastwind , so you do not get gibberish every time?
Would you kindly post some non-gibberish examples?
What did you do to go from Gibberish to English?

@eastwind I now found your contribution here to answer the last question. Thanks!
https://huggingface.co/tiiuae/falcon-40b-instruct/discussions/20

Yeah, not using cache hurts performance alot.

Technology Innovation Institute org

We recommend using Text Generation Inference for fast inference with Falcon. See this blog for more information.

Sign up or log in to comment