THUDM/cogvlm2-llama3-chat-19B · output Discrepancy between Demo and Local

Hello! Congratulations on the amazing release!!

I have just tried the full model using the "cli_demo_multi_gpus.py" script on an ec2-instance, and it works like a charm.

I have 1 concern though, the outputs i am getting from it do not match the ones from the Demo website!
I am setting the parameters exactly the same as the demo (as you can see in the screenshot)

basically, the demo app has much better responses (verbose and precise), compared to multi gpu output (short and vague)
any idea what might be causing that?

Note: i am running the model on bf16 precision.