Spaces:

Xenova
/

webgpu-chat-qwen2

Running

smaller model

by bh4 - opened Jun 24, 2024

bh4

Jun 24, 2024

Please change the model quantization to q4fp16 and also simplified using onnxslim so that the demo can also run on smartphones with limited RAM. It currently runs well on my PC but goes OOM on my android phone with 4GB RAM.

bh4

Jun 29, 2024

@Xenova Along with the above, please also add tokens decoded/s stat while inference.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment