FunAudioLLM
/

InspireMusic-Base

Text-to-Audio

Safetensors

English

qwen2

music_generation

Model card Files Files and versions Community

Chong Zhang commited on Jan 22

Commit

6ec1e51

verified ·

1 Parent(s): ab60f0d

init

Browse files

Files changed (1) hide show

README.md +29 -61

README.md CHANGED Viewed

@@ -2,9 +2,11 @@
 license: apache-2.0
 language:
 - en
 tags:
-- music-generation
 ---
 [//]: # (# InspireMusic)
 <p align="center">
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
@@ -25,12 +27,12 @@ tags:
 <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
         <img alt="Code" src="https://img.shields.io/badge/Code%20⭐-InspireMusic?labelColor=%20%237372EB&label=InspireMusic&color=%20%235462eb"></a>
-<!-- <a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">
-        <img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-lightgrey"></a>
 <a href="https://arxiv.org/abs/" target="_blank">
         <img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
-<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank"> -->
 [//]: # (<a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">)
@@ -69,20 +71,22 @@ tags:
 InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
 ---
 <a name="Highligts"></a>
 ## Highlights
 **InspireMusic** focuses on music generation, song generation and audio generation.
-- A unified framework for music/song/audio generation.
-- Controllable with text prompts, music genres, music structures, etc.
-- Convenient Fine-tuning and Inference: Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily their music generation models.
-- Support text-to-music, music continuation tasks with high audio quality.
 <a name="What's News"></a>
 ## What's New 🔥
-[//]: # (- 2024/12: The [InspireMusic-Base]&#40;https://huggingface.co/FunAudioLLM/InspireMusic-Base&#41; voice understanding model is open-sourced, which offers high quality, diverse text style, music structure, music genre control capability.  )
 - 2024/11: Welcome to preview 👉🏻 [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) 👈🏻. We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
 - 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
@@ -156,20 +160,22 @@ If you are an expert in this field, and you are only interested in training your
 ``` sh
 # git模型下载，请确保已安装git lfs
 mkdir -p pretrained_models
-git clone https://www.modelscope.cn/iic/InspireMusic.git pretrained_models/InspireMusic-Base
 ```
 ### Available Models
-Currently, we open source the music generation models only that supporting 24KHz mono channel audio.
 The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
-| Model name            | Model Links                                                                                                                                                                               | Remarks                                        |
-|------------------------------|-------------------------------------------------------|-------------|
-| InspireMusic-Base     | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)](https://modelscope.cn/models/iic/InspireMusic/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 24kHz mono |
-| InspireMusic-1.5B     | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]()                                      | Pre-trained Music Generation 1.5B Model, 24kHz mono       |
-| InspireSong-Base      | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]()                                      | Pre-trained Song Generation Base Model, 24kHz mono         |
-| InspireSong-1.5B      | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]()                                      | Pre-trained Song Generation 1.5B Model, 24kHz mono         |
-| InspireAudio-1.5B     | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]()                                      | Pre-trained Audio Generation 1.5B Model, 24kHz mono        |
 ## Basic Usage
@@ -185,7 +191,7 @@ bash run.sh
 ### Training
-Here is an example to train LLM model.
 ```sh
 torchrun --nnodes=1 --nproc_per_node=8 \
     --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
@@ -206,7 +212,7 @@ torchrun --nnodes=1 --nproc_per_node=8 \
     --fp16
 ```
-Here is an example code to train flow matching model.
 ```sh
 torchrun --nnodes=1 --nproc_per_node=8 \
     --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
@@ -223,8 +229,7 @@ torchrun --nnodes=1 --nproc_per_node=8 \
     --prefetch 100 \
     --pin_memory \
     --deepspeed_config ./conf/ds_stage2.json \
-    --deepspeed.save_states model+optimizer \
-    --fp16
 ```
 ### Inference
@@ -268,48 +273,11 @@ for task in 'text-to-music' 'continuation'; do
       --wavtokenizer $pretrained_model_dir/wavtokenizer \
       --result_dir `pwd`/exp/inspiremusic/${task}_test \
       --chorus verse \
-      --fast True \
       --min_generate_audio_seconds 8 \
       --max_generate_audio_seconds 30
 done
 ```
-## Community & Discussion
-* Please support our community project 🌟 by starring it on GitHub 🙏
-* Welcome to join our DingTalk and WeChat groups to share and discuss algorithms, technology, and user experience feedback. You may scan the following QR codes to join our official chat groups accordingly.
-<p align="center">
-  <table>
-    <tr>
-      <td style="text-align:center;">
-        <a href="./asset/QR.jpg"><img alt="FunAudioLLM in DingTalk" src="https://img.shields.io/badge/FunAudioLLM-DingTalk-d9d9d9"></a>
-      </td>
-      <td style="text-align:center;">
-        <a href="./asset/QR.jpg"><img alt="InspireMusic in WeChat" src="https://img.shields.io/badge/InspireMusic-WeChat-d9d9d9"></a>
-      </td>
-    </tr>
-    <tr>
-       <td style="text-align:center;">
-      <img alt="Light" src="./asset/dingding.png" width="68%" />
-      <td style="text-align:center;">
-      <img alt="Light" src="./asset/QR.jpg" width="58%" />
-      </td>
-    </tr>
-  </table>
-</p>
-* [Github Discussion](https://github.com/FunAudioLLM/InspireMusic/discussions). Best for sharing feedback and asking questions.
-* [GitHub Issues](https://github.com/FunAudioLLM/InspireMusic/issues). Best for bugs you encounter using InspireMusic, and feature proposals.
-## Acknowledge
-1. We borrowed a lot of code from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice).
-3. We borrowed a lot of code from [WavTokenizer](https://github.com/jishengpeng/WavTokenizer).
-4. We borrowed a lot of code from [AcademiCodec](https://github.com/yangdongchao/AcademiCodec).
-5. We borrowed a lot of code from [FunASR](https://github.com/modelscope/FunASR).
-6. We borrowed a lot of code from [FunCodec](https://github.com/modelscope/FunCodec).
-7. We borrowed a lot of code from [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS).
-9. We borrowed a lot of code from [WeNet](https://github.com/wenet-e2e/wenet).
 ## Disclaimer
-The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.

 license: apache-2.0
 language:
 - en
+pipeline_tag: text-to-audio
 tags:
+- music_generation
 ---
 [//]: # (# InspireMusic)
 <p align="center">
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
 <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
         <img alt="Code" src="https://img.shields.io/badge/Code%20⭐-InspireMusic?labelColor=%20%237372EB&label=InspireMusic&color=%20%235462eb"></a>
+<a href="https://modelscope.cn/models/iic/InspireMusic-1.5B-Long" target="_blank">
+        <img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-green"></a>
 <a href="https://arxiv.org/abs/" target="_blank">
         <img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
+<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
 [//]: # (<a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">)
 InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
+![GitHub Repo stars](https://img.shields.io/github/stars/FunAudioLLM/InspireMusic) Please support our community project 💖 by starring it on GitHub 加⭐支持 🙏
 ---
 <a name="Highligts"></a>
 ## Highlights
 **InspireMusic** focuses on music generation, song generation and audio generation.
+- A unified framework for music/song/audio generation. Controllable with text prompts, music genres, music structures, etc.
+- Support text-to-music, music continuation, audio super-resolution, audio reconstruction tasks with high audio quality, with available sampling rates of 24kHz, 48kHz.
+- Support long audio generation.
+- Convenient fine-tuning and inference. Support mixed precision training (FP16, FP32). Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily their music generation models.
 <a name="What's News"></a>
 ## What's New 🔥
+- 2025/01: Open-source [InspireMusic-Base](https://modelscope.cn/models/iic/InspireMusic/summary), [InspireMusic-Base-24kHz](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary), [InspireMusic-1.5B](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary), [InspireMusic-1.5B-24kHz](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary), [InspireMusic-1.5B-Long](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) models for music generation.
+- 2024/12: Support to generate 48kHz audio with super resolution flow matching.
 - 2024/11: Welcome to preview 👉🏻 [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) 👈🏻. We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
 - 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
 ``` sh
 # git模型下载，请确保已安装git lfs
 mkdir -p pretrained_models
+git clone https://www.modelscope.cn/iic/InspireMusic-1.5B-Long.git pretrained_models/InspireMusic
 ```
 ### Available Models
+Currently, we open source the music generation models support 24KHz mono and 48KHz stereo audio.
 The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
+| Model name              | Model Links                                                                                                                                                                                                                                                                    | Remarks                                                           |
+|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
+| InspireMusic-Base-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base-24kHz)    | Pre-trained Music Generation Model, 24kHz mono                    |
+| InspireMusic-Base       | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base)             | Pre-trained Music Generation Model, 48kHz                         |
+| InspireMusic-1.5B-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz) | Pre-trained Music Generation 1.5B Model, 24kHz mono               |
+| InspireMusic-1.5B       | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B)             | Pre-trained Music Generation 1.5B Model, 48kHz                    |
+| InspireMusic-1.5B-Long  | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long)   | Pre-trained Music Generation 1.5B Model, 48kHz, support long audio |
+| InspireSong-1.5B        | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]()                                                                                                                           | Pre-trained Song Generation 1.5B Model, 48kHz stereo              |
+| InspireAudio-1.5B       | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]()                                                                                                                           | Pre-trained Audio Generation 1.5B Model, 48kHz stereo             |
 ## Basic Usage
 ### Training
+Here is an example to train LLM model, support FP16 training.
 ```sh
 torchrun --nnodes=1 --nproc_per_node=8 \
     --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
     --fp16
 ```
+Here is an example code to train flow matching model, does not support FP16 training.
 ```sh
 torchrun --nnodes=1 --nproc_per_node=8 \
     --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
     --prefetch 100 \
     --pin_memory \
     --deepspeed_config ./conf/ds_stage2.json \
+    --deepspeed.save_states model+optimizer
 ```
 ### Inference
       --wavtokenizer $pretrained_model_dir/wavtokenizer \
       --result_dir `pwd`/exp/inspiremusic/${task}_test \
       --chorus verse \
+      --fast \
       --min_generate_audio_seconds 8 \
       --max_generate_audio_seconds 30
 done
 ```
 ## Disclaimer
+The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.