Add xl variant
#3
by
bayang
- opened
contains:
- config-flan-t5-xl.json
- model-flan-t5-xl.gguf
quantization: q6k
Looks good, could you mention the command line / code change that you needed to be able to test this and how I can run it to try it out?
- Quantization:
cargo run --example tensor-tools --release -- quantize --quantization q6k PATH/TO/T5/model.safetensors /tmp/model.gguf
- Testing:
From Candle, I called my repodeepfile/flan-t5-xl-gguf
instead oflmz/candle-quantized-t5
, because it containsmodel-flan-t5-xl.gguf
file in the main branch.
cargo run --example quantized-t5 --release -- --prompt "translate to German: I'm living in Paris." --model-id "deepfile/flan-t5-xl-gguf" --which "flan-t5-xl"
...
Ich wohne in Paris.
8 tokens generated (7.76 token/s)
( @lmz But this xl quantized model is worse than the quantized large one on open-domain questions. I haven't test it yet on context-based QA )
lmz
changed pull request status to
merged