ProDiff and FastDiff Model Card

Key Features

  • Extremely-Fast diffusion text-to-speech synthesis pipeline for potential industrial deployment.
  • Tutorial and code base for speech diffusion models.
  • More supported diffusion mechanism (e.g., guided diffusion) will be available.

Model Details

  • Model type: Diffusion-based text-to-speech generation model

  • Language(s): English

  • Model Description: A conditional diffusion probabilistic model capable of generating high fidelity speech efficiently.

  • Resources for more information: FastDiff GitHub Repository, FastDiff Paper. ProDiff GitHub Repository, ProDiff Paper.

  • Cite as:

    @inproceedings{huang2022prodiff,
       title={ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech},
       author={Huang, Rongjie and Zhao, Zhou and Liu, Huadai and Liu, Jinglin and Cui, Chenye and Ren, Yi},
       booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
       year={2022}
    
    @inproceedings{huang2022fastdiff,
       title={FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis},
       author={Huang, Rongjie and Lam, Max WY and Wang, Jun and Su, Dan and Yu, Dong and Ren, Yi and Zhao, Zhou},
       booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI-22}},
       year={2022}
    

This model card was written based on the DALL-E Mini model card.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Inference API (serverless) has been turned off for this model.