ARCH is a framework designed to benchmark audio representations. The goal is to provide a unified framework for researchers to compare their audio representations and to provide a benchmark for the community to evaluate their models. The project is currently in its first release. The details about the datasets and the models are available in the GitHub repository.
Model | Size | Audio Events | Music | Speech | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ESC-50 | US8K | FSD50K | VIVAE | FMA | MTT | IRMAS | MS-DB | RAVDESS | A-MNIST | SLURP | EMOVO | ||
facebook/wav2vec2-base | B | 45.73 | 55.48 | 19.39 | 31.47 | 50.54 | 37.56 | 35.14 | 66.06 | 55.32 | 86.38 | 14.37 | 31.80 |
microsoft/wavlm-base | B | 49.88 | 61.84 | 17.63 | 36.31 | 48.71 | 34.93 | 32.62 | 54.18 | 67.94 | 99.50 | 30.98 | 43.08 |
microsoft/wavlm-base-plus | B | 58.73 | 64.07 | 21.57 | 36.17 | 56.17 | 38.24 | 35.76 | 57.51 | 52.20 | 99.63 | 28.06 | 36.73 |
facebook/hubert-base-ls960 | B | 58.90 | 67.28 | 24.53 | 40.48 | 54.63 | 38.78 | 36.65 | 58.46 | 65.28 | 99.58 | 33.75 | 40.48 |
facebook/data2vec-audio-base | B | 23.63 | 45.63 | 10.06 | 30.19 | 40.58 | 27.60 | 25.87 | 50.74 | 48.03 | 99.06 | 43.57 | 27.27 |
ALM/wav2vec2-base-audioset | B | 52.61 | 70.48 | 21.29 | 31.26 | 59.50 | 37.92 | 35.85 | 64.61 | 45.94 | 88.09 | 11.00 | 30.83 |
ALM/hubert-base-audioset | B | 68.80 | 79.09 | 31.05 | 40.06 | 65.87 | 43.44 | 47.67 | 67.81 | 63.54 | 98.84 | 20.53 | 33.39 |
facebook/wav2vec2-large-robust | L | 13.13 | 42.70 | 5.80 | 22.01 | 41.71 | 20.95 | 19.91 | 50.23 | 11.57 | 45.74 | 7.33 | 19.27 |
facebook/wav2vec2-xls-r-300m | L | 51.28 | 69.96 | 23.71 | 36.28 | 56.96 | 38.28 | 38.42 | 66.71 | 31.48 | 98.88 | 12.74 | 20.35 |
microsoft/wavlm-large | L | 67.20 | 70.92 | 32.21 | 42.51 | 61.13 | 41.29 | 42.53 | 68.00 | 71.76 | 99.75 | 42.34 | 45.29 |
facebook/hubert-large-ll60k | L | 63.98 | 70.00 | 29.51 | 40.95 | 54.79 | 38.36 | 36.81 | 64.08 | 72.57 | 99.95 | 45.26 | 43.76 |
facebook/data2vec-audio-large | L | 25.35 | 49.15 | 10.82 | 30.57 | 43.46 | 28.52 | 27.08 | 44.20 | 45.14 | 99.15 | 28.60 | 23.07 |
ALM/wav2vec2-large-audioset | L | 74.39 | 79.00 | 37.58 | 39.65 | 66.58 | 44.51 | 49.87 | 76.90 | 59.49 | 99.42 | 17.74 | 38.20 |
ALM/hubert-large-audioset | L | 71.52 | 75.63 | 37.41 | 44.28 | 67.54 | 43.35 | 50.46 | 77.82 | 73.26 | 99.59 | 20.46 | 38.61 |
facebook/wav2vec2-xls-r-1b | XL | 66.95 | 75.90 | 31.61 | 40.41 | 62.79 | 41.99 | 43.57 | 69.79 | 55.44 | 99.86 | 25.14 | 34.58 |
facebook/hubert-xlarge-ll60k | XL | 63.40 | 69.66 | 29.32 | 42.72 | 56.25 | 37.76 | 37.30 | 64.71 | 75.69 | 99.95 | 47.81 | 47.17 |
Best-performing model per size is highlighted in bold. Best performing model overall is highlighted in underlined.