execute in the video demo
Glad to see such an amazing work!
Wanna know whether the execute script in video demo will be released.
Some other detailed code is also expected.
Will you release more example code for use on Android device. If so, when?
Thanks again for your wonderful work.
I have tried the example provided by the team and got a result with a latency 20.7s.
Wanna know the latency examed on your device so that the accerated model inference mentioned in paper can be felt directly.
Looking forward to your reply.
Language model has many optimizations like KV cache, quantization, model pruning, specific memory access pattern, etc... Try to use these tricks
Or, you can join our waitlist: https://www.nexa4ai.com/contact, and we will give solutions for the above.
@AcceleratedNpc Which GPU are you using? Also, please implement early stopping criteria. Once you observe , the inference can stop