Reproduction Fail on Llama 3 instruction

#31
by WYJLUAI - opened

Hi, thanks for sharing. I'm trying to reproduce the results from the original paper on the QuAC dataset using Llama-3-8B-instruct, but I'm only achieving an F1 score of 27.85 while the paper reports 33.60.

I'm wondering if there are any parameters (e.g., max_tokens) that may not be reflected in the Hugging Face that could affect this result.

Any guidance would be greatly appreciated!

Sign up or log in to comment