metadata
title: README
emoji: π
colorFrom: yellow
colorTo: green
sdk: static
pinned: false
pyannote.audio is an open-source toolkit for speaker diarization.
Pretrained pipelines reach state-of-the-art performance on most academic benchmarks.
Training is made possible thanks to Jean Zay supercomputer.
pyannoteAI provides even better and faster enterprise options, which can be tried for free on our playground.
Benchmark | v2.1 | v3.1 | pyannoteAI |
---|---|---|---|
AISHELL-4 | 14.1 | 12.2 | 11.2 |
AliMeeting (channel 1) | 27.4 | 24.4 | 19.3 |
AMI (IHM) | 18.9 | 18.8 | 15.8 |
AMI (SDM) | 27.1 | 22.4 | 19.3 |
AVA-AVD | 66.3 | 50.0 | 44.8 |
CALLHOME (part 2) | 31.6 | 28.4 | 19.8 |
DIHARD 3 (full) | 26.9 | 21.7 | 16.8 |
Earnings21 | 17.0 | 9.4 | 9.1 |
Ego4D (dev.) | 61.5 | 51.2 | 44.0 |
MSDWild | 32.8 | 25.3 | 19.8 |
RAMC | 22.5 | 22.2 | 11.1 |
REPERE (phase2) | 8.2 | 7.8 | 7.6 |
VoxConverse (v0.3) | 11.2 | 11.3 | 9.8 |
Diarization error rate (in %) |
Using high-end NVIDIA hardware,
- v2.1 takes around 1m30s to process 1h of audio
- v3.1 takes around 1m20s to process 1h of audio
- On-premise pyannoteAI takes less than 30s to process 1h of audio