Papers
arxiv:2409.13049

DiffSSD: A Diffusion-Based Dataset For Speech Forensics

Published on Sep 19, 2024
Authors:
,
,
,

Abstract

Diffusion-based speech generators are ubiquitous. These methods can generate very high quality synthetic speech and several recent incidents report their malicious use. To counter such misuse, <PRE_TAG>synthetic speech detectors</POST_TAG> have been developed. Many of these detectors are trained on datasets which do not include diffusion-based synthesizers. In this paper, we demonstrate that existing detectors trained on one such dataset, ASVspoof2019, do not perform well in detecting synthetic speech from recent diffusion-based synthesizers. We propose the Diffusion-Based Synthetic Speech Dataset (DiffSSD), a dataset consisting of about 200 hours of labeled speech, including synthetic speech generated by 8 diffusion-based open-source and 2 commercial generators. We also examine the performance of existing <PRE_TAG>synthetic speech detectors</POST_TAG> on DiffSSD in both closed-set and open-set scenarios. The results highlight the importance of this dataset in detecting synthetic speech generated from recent open-source and commercial speech generators.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.13049 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.13049 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.