MIT/ast-finetuned-audioset-16-16-0.442 · Discrepancy in Model Performance Using HuggingFace Pipeline Utility

Hi @nielsr

I am attempting to reproduce the performance metrics of models using checkpoints from the Huggingface Hub and the original AST GitHub repository, but I am encountering different results. The recorded performance metrics were as follows:

Checkpoint	mAP	AUC-ROC
`MIT/ast-finetuned-audioset-16-16-0.442`	0.4040	0.9671
`MIT/ast-finetuned-audioset-10-10-0.4593`	0.4256	0.9737

These results do not closely align with the expected performance. Additionally, the number of parameters differ (86.6M for MIT/ast-finetuned-audioset-10-10-0.450 compared to 88.1M in the original implementation). FYI, I downloaded the AudioSet from this repo.

I have also opened an issue in the author's GitHub repository. Do you have any insights or thoughts on this matter? Any assistance would be greatly appreciated.