smaller model

#1
by bh4 - opened

Please change the model quantization to q4fp16 and also simplified using onnxslim so that the demo can also run on smartphones with limited RAM. It currently runs well on my PC but goes OOM on my android phone with 4GB RAM.

@Xenova Along with the above, please also add tokens decoded/s stat while inference.

Sign up or log in to comment