Reproduction Fail on Llama 3 instruction
#31
by
WYJLUAI
- opened
Hi, thanks for sharing. I'm trying to reproduce the results from the original paper on the QuAC dataset using Llama-3-8B-instruct, but I'm only achieving an F1 score of 27.85 while the paper reports 33.60.
I'm wondering if there are any parameters (e.g., max_tokens) that may not be reflected in the Hugging Face that could affect this result.
Any guidance would be greatly appreciated!