ShiftySpeech / Readme.md
ash56's picture
Update Readme.md
e4cd9ac verified
|
raw
history blame
1.44 kB
metadata
license: apache-2.0

This repository contains the model checkpoints related to the paper: Less is More for Synthetic Speech Detection in the Wild

Dataset can be downloaded from here

πŸ”₯ Key Features

  • 3000+ hours of synthetic speech
  • Diverse Distribution Shifts: The dataset spans 7 key distribution shifts, including:
    • πŸ“– Reading Style
    • πŸŽ™οΈ Podcast
    • πŸŽ₯ YouTube
    • πŸ—£οΈ Languages (Three different languages)
    • 🌎 Demographics (including variations in age, accent, and gender)
  • Multiple Speech Generation Systems: Includes data synthesized from various TTS models and vocoders.

πŸ’‘ Why We Built This Dataset

Driven by advances in self-supervised learning for speech, state-of-the-art synthetic speech detectors have achieved low error rates on popular benchmarks such as ASVspoof. However, prior benchmarks do not address the wide range of real-world variability in speech. Are reported error rates realistic in real-world conditions? To assess detector failure modes and robustness under controlled distribution shifts, we introduce ShiftySpeech, a benchmark with more than 3000 hours of synthetic speech from 7 domains, 6 TTS systems, 12 vocoders, and 3 languages.

πŸš€ Stay tuned! More model checkpoints will be available soon.