ash56
/

ShiftySpeech

Model card Files Files and versions Community

ShiftySpeech / Readme.md

ash56's picture

Update Readme.md

e4cd9ac verified 29 days ago

|

1.44 kB

	---
	license: apache-2.0
	---

	This repository contains the model checkpoints related to the paper: [Less is More for Synthetic Speech Detection in the Wild](https://arxiv.org/abs/2502.05674)

	Dataset can be downloaded from [here](https://huggingface.co/datasets/ash56/ShiftySpeech/tree/main)

	## 🔥 Key Features
	- 3000+ hours of synthetic speech
	- Diverse Distribution Shifts: The dataset spans 7 key distribution shifts, including:
	- 📖 Reading Style
	- 🎙️ Podcast
	- 🎥 YouTube
	- 🗣️ Languages (Three different languages)
	- 🌎 Demographics (including variations in age, accent, and gender)
	- Multiple Speech Generation Systems: Includes data synthesized from various TTS models and vocoders.

	## 💡 Why We Built This Dataset
	> Driven by advances in self-supervised learning for speech, state-of-the-art synthetic speech detectors have achieved low error rates on popular benchmarks such as ASVspoof. However, prior benchmarks do not address the wide range of real-world variability in speech. Are reported error rates realistic in real-world conditions? To assess detector failure modes and robustness under controlled distribution shifts, we introduce ShiftySpeech, a benchmark with more than 3000 hours of synthetic speech from 7 domains, 6 TTS systems, 12 vocoders, and 3 languages.
	>
	🚀 Stay tuned! More model checkpoints will be available soon.