arxiv:2407.14358

Stable Audio Open

Published on Jul 19

· Submitted by

akhaliq on Jul 22

Authors:

Zach Evans ,

Julian D. Parker ,

CJ Carr ,

Zack Zukowski ,

Josiah Taylor ,

Jordi Pons

Abstract

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

View arXiv page View PDF Add to collection

Community

akhaliq

Paper submitter Jul 22

https://stability-ai.github.io/stable-audio-open-demo/

HiDolen

Jul 22

Nice work

Jul 22

آیا این فقط در مورد کلام است

1nader

Jul 22

🔥🔥🔥

Jul 23

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.14358 in a dataset README.md to link it from this page.

Spaces citing this paper 26

Collections including this paper 17