mistralai/Mamba-Codestral-7B-v0.1 · The model keeps outputting "pass" for questions in HumanEval

I am trying to use Huggingface transformers lib with model.generate to replicate the experimental results on the HumanEval dataset.
However, the model keeps outputting "pass" for nearly all the problems.

I am not sure whether the tokenizer or generation settings of the HF and the Mistral versions are identical.
Did anyone have the same experience? I would like to know how to solve it.

I understand that the mistral-inference lib could generate desirable results, however, I wants to make some modifications on the mistral models, which requires me to make the results reproduceable on the default huggingface training/inference pipeline.