请问这个加速版本还支持原版模型的quantize接口吗，4bit和8bit量化

#11

by goldenteethCN - opened May 25, 2023

May 25, 2023

或者有考虑直接放出加速版本的4bit或8bit模型吗

Tencent Music Entertainment Lyra Lab org May 31, 2023

@vanewu 看看这个问题

vanewu

Tencent Music Entertainment Lyra Lab org Jun 2, 2023

@goldenteethCN 我们已经更新了新的加速版本，移除了之前 TensorRT 的加速版本。新的加速版本在源码级做了较多优化，性能，易用性和显卡适配性都更好。可以更新使用。对于量化上目前我们咱未支持，后续会根据需求再看是否对其进行操作。

vanewu

Tencent Music Entertainment Lyra Lab org Jun 12, 2023

@goldenteethCN 新版支持 int8 模式，详见 readme. 但是我们未做过 int4 的比对，可能存在精度差异。

vanewu changed discussion status to closed Jun 12, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment