Chong Zhang commited on
Commit
6ec1e51
·
verified ·
1 Parent(s): ab60f0d
Files changed (1) hide show
  1. README.md +29 -61
README.md CHANGED
@@ -2,9 +2,11 @@
2
  license: apache-2.0
3
  language:
4
  - en
 
5
  tags:
6
- - music-generation
7
  ---
 
8
  [//]: # (# InspireMusic)
9
  <p align="center">
10
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
@@ -25,12 +27,12 @@ tags:
25
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
26
  <img alt="Code" src="https://img.shields.io/badge/Code%20⭐-InspireMusic?labelColor=%20%237372EB&label=InspireMusic&color=%20%235462eb"></a>
27
 
28
- <!-- <a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">
29
- <img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-lightgrey"></a>
30
 
31
  <a href="https://arxiv.org/abs/" target="_blank">
32
  <img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
33
- <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank"> -->
34
 
35
  [//]: # (<a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">)
36
 
@@ -69,20 +71,22 @@ tags:
69
 
70
  InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
71
 
 
72
 
73
  ---
74
  <a name="Highligts"></a>
75
  ## Highlights
76
  **InspireMusic** focuses on music generation, song generation and audio generation.
77
- - A unified framework for music/song/audio generation.
78
- - Controllable with text prompts, music genres, music structures, etc.
79
- - Convenient Fine-tuning and Inference: Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily their music generation models.
80
- - Support text-to-music, music continuation tasks with high audio quality.
81
 
82
  <a name="What's News"></a>
83
  ## What's New 🔥
84
 
85
- [//]: # (- 2024/12: The [InspireMusic-Base]&#40;https://huggingface.co/FunAudioLLM/InspireMusic-Base&#41; voice understanding model is open-sourced, which offers high quality, diverse text style, music structure, music genre control capability. )
 
86
  - 2024/11: Welcome to preview 👉🏻 [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) 👈🏻. We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
87
  - 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
88
 
@@ -156,20 +160,22 @@ If you are an expert in this field, and you are only interested in training your
156
  ``` sh
157
  # git模型下载,请确保已安装git lfs
158
  mkdir -p pretrained_models
159
- git clone https://www.modelscope.cn/iic/InspireMusic.git pretrained_models/InspireMusic-Base
160
  ```
161
 
162
  ### Available Models
163
- Currently, we open source the music generation models only that supporting 24KHz mono channel audio.
164
  The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
165
 
166
- | Model name | Model Links | Remarks |
167
- |------------------------------|-------------------------------------------------------|-------------|
168
- | InspireMusic-Base | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)](https://modelscope.cn/models/iic/InspireMusic/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 24kHz mono |
169
- | InspireMusic-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Music Generation 1.5B Model, 24kHz mono |
170
- | InspireSong-Base | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation Base Model, 24kHz mono |
171
- | InspireSong-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation 1.5B Model, 24kHz mono |
172
- | InspireAudio-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Audio Generation 1.5B Model, 24kHz mono |
 
 
173
 
174
  ## Basic Usage
175
 
@@ -185,7 +191,7 @@ bash run.sh
185
 
186
  ### Training
187
 
188
- Here is an example to train LLM model.
189
  ```sh
190
  torchrun --nnodes=1 --nproc_per_node=8 \
191
  --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
@@ -206,7 +212,7 @@ torchrun --nnodes=1 --nproc_per_node=8 \
206
  --fp16
207
  ```
208
 
209
- Here is an example code to train flow matching model.
210
  ```sh
211
  torchrun --nnodes=1 --nproc_per_node=8 \
212
  --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
@@ -223,8 +229,7 @@ torchrun --nnodes=1 --nproc_per_node=8 \
223
  --prefetch 100 \
224
  --pin_memory \
225
  --deepspeed_config ./conf/ds_stage2.json \
226
- --deepspeed.save_states model+optimizer \
227
- --fp16
228
  ```
229
 
230
  ### Inference
@@ -268,48 +273,11 @@ for task in 'text-to-music' 'continuation'; do
268
  --wavtokenizer $pretrained_model_dir/wavtokenizer \
269
  --result_dir `pwd`/exp/inspiremusic/${task}_test \
270
  --chorus verse \
271
- --fast True \
272
  --min_generate_audio_seconds 8 \
273
  --max_generate_audio_seconds 30
274
  done
275
  ```
276
 
277
- ## Community & Discussion
278
- * Please support our community project 🌟 by starring it on GitHub 🙏
279
- * Welcome to join our DingTalk and WeChat groups to share and discuss algorithms, technology, and user experience feedback. You may scan the following QR codes to join our official chat groups accordingly.
280
-
281
- <p align="center">
282
- <table>
283
- <tr>
284
- <td style="text-align:center;">
285
- <a href="./asset/QR.jpg"><img alt="FunAudioLLM in DingTalk" src="https://img.shields.io/badge/FunAudioLLM-DingTalk-d9d9d9"></a>
286
- </td>
287
- <td style="text-align:center;">
288
- <a href="./asset/QR.jpg"><img alt="InspireMusic in WeChat" src="https://img.shields.io/badge/InspireMusic-WeChat-d9d9d9"></a>
289
- </td>
290
- </tr>
291
- <tr>
292
- <td style="text-align:center;">
293
- <img alt="Light" src="./asset/dingding.png" width="68%" />
294
- <td style="text-align:center;">
295
- <img alt="Light" src="./asset/QR.jpg" width="58%" />
296
- </td>
297
- </tr>
298
- </table>
299
- </p>
300
-
301
- * [Github Discussion](https://github.com/FunAudioLLM/InspireMusic/discussions). Best for sharing feedback and asking questions.
302
- * [GitHub Issues](https://github.com/FunAudioLLM/InspireMusic/issues). Best for bugs you encounter using InspireMusic, and feature proposals.
303
-
304
- ## Acknowledge
305
-
306
- 1. We borrowed a lot of code from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice).
307
- 3. We borrowed a lot of code from [WavTokenizer](https://github.com/jishengpeng/WavTokenizer).
308
- 4. We borrowed a lot of code from [AcademiCodec](https://github.com/yangdongchao/AcademiCodec).
309
- 5. We borrowed a lot of code from [FunASR](https://github.com/modelscope/FunASR).
310
- 6. We borrowed a lot of code from [FunCodec](https://github.com/modelscope/FunCodec).
311
- 7. We borrowed a lot of code from [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS).
312
- 9. We borrowed a lot of code from [WeNet](https://github.com/wenet-e2e/wenet).
313
-
314
  ## Disclaimer
315
- The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ pipeline_tag: text-to-audio
6
  tags:
7
+ - music_generation
8
  ---
9
+
10
  [//]: # (# InspireMusic)
11
  <p align="center">
12
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
 
27
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
28
  <img alt="Code" src="https://img.shields.io/badge/Code%20⭐-InspireMusic?labelColor=%20%237372EB&label=InspireMusic&color=%20%235462eb"></a>
29
 
30
+ <a href="https://modelscope.cn/models/iic/InspireMusic-1.5B-Long" target="_blank">
31
+ <img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-green"></a>
32
 
33
  <a href="https://arxiv.org/abs/" target="_blank">
34
  <img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
35
+ <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
36
 
37
  [//]: # (<a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">)
38
 
 
71
 
72
  InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
73
 
74
+ ![GitHub Repo stars](https://img.shields.io/github/stars/FunAudioLLM/InspireMusic) Please support our community project 💖 by starring it on GitHub 加⭐支持 🙏
75
 
76
  ---
77
  <a name="Highligts"></a>
78
  ## Highlights
79
  **InspireMusic** focuses on music generation, song generation and audio generation.
80
+ - A unified framework for music/song/audio generation. Controllable with text prompts, music genres, music structures, etc.
81
+ - Support text-to-music, music continuation, audio super-resolution, audio reconstruction tasks with high audio quality, with available sampling rates of 24kHz, 48kHz.
82
+ - Support long audio generation.
83
+ - Convenient fine-tuning and inference. Support mixed precision training (FP16, FP32). Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily their music generation models.
84
 
85
  <a name="What's News"></a>
86
  ## What's New 🔥
87
 
88
+ - 2025/01: Open-source [InspireMusic-Base](https://modelscope.cn/models/iic/InspireMusic/summary), [InspireMusic-Base-24kHz](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary), [InspireMusic-1.5B](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary), [InspireMusic-1.5B-24kHz](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary), [InspireMusic-1.5B-Long](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) models for music generation.
89
+ - 2024/12: Support to generate 48kHz audio with super resolution flow matching.
90
  - 2024/11: Welcome to preview 👉🏻 [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) 👈🏻. We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
91
  - 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
92
 
 
160
  ``` sh
161
  # git模型下载,请确保已安装git lfs
162
  mkdir -p pretrained_models
163
+ git clone https://www.modelscope.cn/iic/InspireMusic-1.5B-Long.git pretrained_models/InspireMusic
164
  ```
165
 
166
  ### Available Models
167
+ Currently, we open source the music generation models support 24KHz mono and 48KHz stereo audio.
168
  The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
169
 
170
+ | Model name | Model Links | Remarks |
171
+ |-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
172
+ | InspireMusic-Base-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base-24kHz) | Pre-trained Music Generation Model, 24kHz mono |
173
+ | InspireMusic-Base | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-Base/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 48kHz |
174
+ | InspireMusic-1.5B-24kHz | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz) | Pre-trained Music Generation 1.5B Model, 24kHz mono |
175
+ | InspireMusic-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B) | Pre-trained Music Generation 1.5B Model, 48kHz |
176
+ | InspireMusic-1.5B-Long | [![model](https://img.shields.io/badge/ModelScope-Model-green.svg)](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-green.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long) | Pre-trained Music Generation 1.5B Model, 48kHz, support long audio |
177
+ | InspireSong-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation 1.5B Model, 48kHz stereo |
178
+ | InspireAudio-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Audio Generation 1.5B Model, 48kHz stereo |
179
 
180
  ## Basic Usage
181
 
 
191
 
192
  ### Training
193
 
194
+ Here is an example to train LLM model, support FP16 training.
195
  ```sh
196
  torchrun --nnodes=1 --nproc_per_node=8 \
197
  --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
 
212
  --fp16
213
  ```
214
 
215
+ Here is an example code to train flow matching model, does not support FP16 training.
216
  ```sh
217
  torchrun --nnodes=1 --nproc_per_node=8 \
218
  --rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
 
229
  --prefetch 100 \
230
  --pin_memory \
231
  --deepspeed_config ./conf/ds_stage2.json \
232
+ --deepspeed.save_states model+optimizer
 
233
  ```
234
 
235
  ### Inference
 
273
  --wavtokenizer $pretrained_model_dir/wavtokenizer \
274
  --result_dir `pwd`/exp/inspiremusic/${task}_test \
275
  --chorus verse \
276
+ --fast \
277
  --min_generate_audio_seconds 8 \
278
  --max_generate_audio_seconds 30
279
  done
280
  ```
281
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
282
  ## Disclaimer
283
+ The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.