text-generation-inference: chaotic/unuseable answers.
Hi, i run the model with TGI from hugging face, with those parameter:
export MODEL=mistralai/Mixtral-8x7B-Instruct-v0.1
export TAG=2.0.3
#export TAG=latest
docker run -it --name tgi_server
--hostname 0.0.0.0
--gpus all
--shm-size 1g
-e HUGGING_FACE_HUB_TOKEN=$token
-p 8080:80
-v ${TGI_VOLUME}:/data
ghcr.io/huggingface/text-generation-inference:${TAG}
--model-id $MODEL
--max-input-length 3000
--max-total-tokens 3500
--max-batch-prefill-tokens 5000
--num-shard 2
--quantize eetq
--ngrok \
Now, i got the problem, that the model answers are unusable.
See for yourself:
Do you have any idea, what my problem is here?
Thanks in advance!
Lukas