Chong Zhang commited on
Commit
30fe868
Β·
verified Β·
1 Parent(s): 0daf9d9

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +121 -33
README.md CHANGED
@@ -1,46 +1,94 @@
1
- ---
2
- license: apache-2.0
3
- pipeline_tag: text-to-audio
4
- tags:
5
- - text-to-music
6
- - music-generation
7
- ---
8
-
9
- # InspireMusic
10
-
11
  <p align="center">
12
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
13
- <img alt="InspireMusic" src="https://svg-banners.vercel.app/api?type=origin&text1=Inspire%20Music🎢&text2=πŸ€—%20A%20Fundamental%20Music%20Song%20Audio%20Generation%20Toolkit&width=800&height=210"></a>
14
  </p>
15
 
16
- [**Code**](https://inspiremusic.github.io/) | [**Demo**](https://iris2c.github.io/InspireMusic/) | [**ModelScope**](https://modelscope.cn/models/iic/InspireMusic/summary) | [**HuggingFace**](https://huggingface.co/FunAudioLLM/InspireMusic-Base)
17
 
18
- InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
19
 
20
- ## Introduction
21
- > [!Note]
22
- > This repo contains the algorithm infrastructure and some simple examples.
23
 
24
- > [!Tip]
25
- > To explore the performance, please refer to [**InspireMusic Demo Page**](https://iris2c.github.io/InspireMusic). Space will also coming soon.
26
 
27
- InspireMusic is a unified music, song and audio generation framework through the audio tokenization and detokenization process integrated with a large autoregressive transformer. The original motive of this toolkit is to empower the common users to innovate soundscapes and enhance euphony in research through music, song, and audio crafting. The toolkit provides both inference and training code for AI generative models that create high-quality music. Featuring a unified framework, InspireMusic incorporates autoregressive Transformer and conditional flow-matching modeling (CFM), allowing for the controllable generation of music, songs, and audio with both textual and structural music conditioning, as well as neural audio tokenizers. Currently, the toolkit supports text-to-music generation and plans to expand its capabilities to include text-to-song and text-to-audio generation in the future.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  <a name="Highligts"></a>
30
  ## Highlights
31
  **InspireMusic** focuses on music generation, song generation and audio generation.
32
  - A unified framework for music/song/audio generation.
33
  - Controllable with text prompts, music genres, music structures, etc.
34
  - Convenient Fine-tuning and Inference: Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily their music generation models.
35
- - High audio quality.
36
 
37
  <a name="What's News"></a>
38
  ## What's New πŸ”₯
39
 
40
  [//]: # (- 2024/12: The [InspireMusic-Base]&#40;https://huggingface.co/FunAudioLLM/InspireMusic-Base&#41; voice understanding model is open-sourced, which offers high quality, diverse text style, music structure, music genre control capability. )
41
- - 2024/11: Welcome to preview πŸ‘‰πŸ» [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) πŸ‘ˆπŸ», more features and models will comming soon. We're excited to share this with you and are working hard to bring even more features and models very soon. Your support and feedback mean a lot to us!
42
  - 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
43
 
 
 
 
 
 
 
 
 
 
44
  ## Installation
45
 
46
  ### Clone
@@ -106,12 +154,12 @@ git clone https://www.modelscope.cn/iic/InspireMusic.git pretrained_models/Inspi
106
  ```
107
 
108
  ### Available Models
109
- Currently, we open source the music generation models that only support 24KHz mono channel audio.
110
  The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
111
 
112
  | Model name | Model Links | Remarks |
113
  |------------------------------|-------------------------------------------------------|-------------|
114
- | InspireMusic-Base | [![model](https://img.shields.io/badge/ModelScope-Model-orange.svg)](https://modelscope.cn/models/iic/InspireMusic/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-orange.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 24kHz mono |
115
  | InspireMusic-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Music Generation 1.5B Model, 24kHz mono |
116
  | InspireSong-Base | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation Base Model, 24kHz mono |
117
  | InspireSong-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation 1.5B Model, 24kHz mono |
@@ -181,10 +229,11 @@ cd InspireMusic/examples/music_generation/
181
  bash infer.sh
182
  ```
183
 
184
- Here is an example code to run inference with flow matching model.
185
  ```sh
186
  pretrained_model_dir = "./pretrained_models/InspireMusic/"
187
- python inspiremusic/bin/inference.py --mode sft \
 
188
  --gpu 0 \
189
  --config conf/inspiremusic.yaml \
190
  --prompt_data data/test/parquet/data.list \
@@ -192,16 +241,18 @@ python inspiremusic/bin/inference.py --mode sft \
192
  --llm_model $pretrained_model_dir/llm.pt \
193
  --music_tokenizer $pretrained_model_dir/music_tokenizer \
194
  --wavtokenizer $pretrained_model_dir/wavtokenizer \
195
- --result_dir `pwd`/exp/inspiremusic/sft_test \
196
  --chorus verse \
197
  --min_generate_audio_seconds 8 \
198
  --max_generate_audio_seconds 30
 
199
  ```
200
 
201
- Here is an example code to run inference without flow matching model.
202
  ```sh
203
  pretrained_model_dir = "./pretrained_models/InspireMusic/"
204
- python inspiremusic/bin/inference.py --mode sft \
 
205
  --gpu 0 \
206
  --config conf/inspiremusic.yaml \
207
  --prompt_data data/test/parquet/data.list \
@@ -209,17 +260,44 @@ python inspiremusic/bin/inference.py --mode sft \
209
  --llm_model $pretrained_model_dir/llm.pt \
210
  --music_tokenizer $pretrained_model_dir/music_tokenizer \
211
  --wavtokenizer $pretrained_model_dir/wavtokenizer \
212
- --no_flow_mode True \
213
- --result_dir `pwd`/exp/inspiremusic/sft_test \
214
  --chorus verse \
 
215
  --min_generate_audio_seconds 8 \
216
  --max_generate_audio_seconds 30
 
217
  ```
218
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
219
  ### Friend Links
220
  Checkout some awesome Github repositories from Speech Lab of Institute for Intelligent Computing, Alibaba Group.
221
 
222
- <p align="left">
223
  <a href="https://github.com/modelscope/ClearerVoice-Studio" target="_blank">
224
  <img alt="Demo" src="https://img.shields.io/badge/Repo | Space-ClearVoice?labelColor=&label=ClearVoice&color=green"></a>
225
  <a href="https://github.com/FunAudioLLM/CosyVoice" target="_blank">
@@ -236,10 +314,10 @@ Checkout some awesome Github repositories from Speech Lab of Institute for Intel
236
  <table>
237
  <tr>
238
  <td style="text-align:center;">
239
- <a href="./asset/QR.jpg"><img alt="FunAudioLLM in DingTalk" src="https://img.shields.io/badge/DingTalk-FunAudioLLM?labelColor=lightblue&label=FunAudioLLM&color=blue"></a>
240
  </td>
241
  <td style="text-align:center;">
242
- <a href="./asset/QR.jpg"><img alt="InspireMusic in WeChat" src="https://img.shields.io/badge/WeChat-InspireMusic?labelColor=lightblue&label=InspireMusic&color=blue"></a>
243
  </td>
244
  </tr>
245
  <tr>
@@ -255,5 +333,15 @@ Checkout some awesome Github repositories from Speech Lab of Institute for Intel
255
  * [Github Discussion](https://github.com/FunAudioLLM/InspireMusic/discussions). Best for sharing feedback and asking questions.
256
  * [GitHub Issues](https://github.com/FunAudioLLM/InspireMusic/issues). Best for bugs you encounter using InspireMusic, and feature proposals.
257
 
 
 
 
 
 
 
 
 
 
 
258
  ## Disclaimer
259
  The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.
 
1
+ [//]: # (# InspireMusic)
 
 
 
 
 
 
 
 
 
2
  <p align="center">
3
  <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
4
+ <img alt="logo" src="./asset/logo.png" width="100%"></a>
5
  </p>
6
 
7
+ [//]: # (<p align="center">)
8
 
9
+ [//]: # ( <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">)
10
 
11
+ [//]: # ( <img alt="InspireMusic" src="https://svg-banners.vercel.app/api?type=origin&text1=Inspire%20Music🎢&text2=πŸ€—%20A%20Fundamental%20Music%20Song%20Audio%20Generation%20Toolkit&width=800&height=210"></a>)
 
 
12
 
13
+ [//]: # (</p>)
 
14
 
15
+ <p align="center">
16
+ <a href="https://iris2c.github.io/InspireMusic" target="_blank">
17
+ <img alt="Demo" src="https://img.shields.io/badge/Demo%20πŸ‘ˆπŸ»-InspireMusic?labelColor=%20%23FDB062&label=InspireMusic&color=%20%23f79009"></a>
18
+ <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
19
+ <img alt="Code" src="https://img.shields.io/badge/Code%20⭐-InspireMusic?labelColor=%20%237372EB&label=InspireMusic&color=%20%235462eb"></a>
20
+
21
+ <a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">
22
+ <img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-lightgrey"></a>
23
+
24
+ <a href="https://arxiv.org/abs/" target="_blank">
25
+ <img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
26
+ <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
27
+
28
+ [//]: # (<a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">)
29
+
30
+ [//]: # ( <img alt="Model" src="https://img.shields.io/badge/Model-InspireMusic?labelColor=%20%23FDA199&label=InspireMusic&color=orange"></a>)
31
+
32
+ [//]: # (<a href="https://arxiv.org/abs/" target="_blank">)
33
+
34
+ [//]: # ( <img alt="Paper" src="https://img.shields.io/badge/Paper-arXiv?labelColor=%20%23528bff&label=arXiv&color=%20%23155EEF"></a>)
35
+
36
+ [//]: # (<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">)
37
+
38
+ [//]: # ( <img alt="Githube Star" src="https://img.shields.io/github/stars/FunAudioLLM/InspireMusic"></a>)
39
+
40
+ [//]: # (<a href="https://github.com/FunAudioLLM/InspireMusic/blob/main/asset/QR.jpg" target="_blank">)
41
+
42
+ [//]: # ( <img src="https://img.shields.io/badge/group%20chat-group?&labelColor=%20%235462eb&color=%20%235462eb" alt="chat on WeChat"></a>)
43
+ [//]: # (<a href="https://discord.gg/nSPpRU7fRr" target="_blank">)
44
+
45
+ [//]: # ( <img src="https://img.shields.io/badge/discord-chat?&labelColor=%20%235462eb&color=%20%235462eb" alt="chat on Discord"></a>)
46
+
47
+ [//]: # ( <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">)
48
+
49
+ [//]: # ( <img alt="Static Badge" src="https://img.shields.io/badge/v0.1-version?logo=free&color=%20%23155EEF&label=version&labelColor=%20%23528bff"></a>)
50
+ [//]: # (<a href="https://github.com/FunAudioLLM/InspireMusic/graphs/commit-activity" target="_blank">)
51
+
52
+ [//]: # (<img alt="Commits last month" src="https://img.shields.io/github/commit-activity/m/FunAudioLLM/InspireMusic?labelColor=%20%2332b583&color=%20%2312b76a"></a>)
53
+
54
+ [//]: # ( <a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">)
55
 
56
+ [//]: # ( <img alt="Issues closed" src="https://img.shields.io/github/issues-search?query=repo%3AFunAudioLLM%2FInspireMusic%20is%3Aclosed&label=issues%20closed&labelColor=%20%237d89b0&color=%20%235d6b98"></a>)
57
+
58
+ [//]: # ( <a href="https://github.com/FunAudioLLM/InspireMusic/discussions/" target="_blank">)
59
+
60
+ [//]: # ( <img alt="Discussion posts" src="https://img.shields.io/github/discussions/FunAudioLLM/InspireMusic?labelColor=%20%239b8afb&color=%20%237a5af8"></a>)
61
+ </p>
62
+
63
+ InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
64
+
65
+ ![GitHub Repo stars](https://img.shields.io/github/stars/FunAudioLLM/InspireMusic) Please support our community project πŸ’– by starring it on GitHub εŠ β­ζ”―ζŒ πŸ™
66
+
67
+ ---
68
  <a name="Highligts"></a>
69
  ## Highlights
70
  **InspireMusic** focuses on music generation, song generation and audio generation.
71
  - A unified framework for music/song/audio generation.
72
  - Controllable with text prompts, music genres, music structures, etc.
73
  - Convenient Fine-tuning and Inference: Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily their music generation models.
74
+ - Support text-to-music, music continuation tasks with high audio quality.
75
 
76
  <a name="What's News"></a>
77
  ## What's New πŸ”₯
78
 
79
  [//]: # (- 2024/12: The [InspireMusic-Base]&#40;https://huggingface.co/FunAudioLLM/InspireMusic-Base&#41; voice understanding model is open-sourced, which offers high quality, diverse text style, music structure, music genre control capability. )
80
+ - 2024/11: Welcome to preview πŸ‘‰πŸ» [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) πŸ‘ˆπŸ». We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
81
  - 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
82
 
83
+ ## Introduction
84
+ > [!Note]
85
+ > This repo contains the algorithm infrastructure and some simple examples.
86
+
87
+ > [!Tip]
88
+ > To explore the performance, please refer to [InspireMusic Demo Page](https://iris2c.github.io/InspireMusic). We will open-source InspireMusic models and HuggingFace Space soon.
89
+
90
+ InspireMusic is a unified music, song and audio generation framework through the audio tokenization and detokenization process integrated with a large autoregressive transformer. The original motive of this toolkit is to empower the common users to innovate soundscapes and enhance euphony in research through music, song, and audio crafting. The toolkit provides both inference and training code for AI generative models that create high-quality music. Featuring a unified framework, InspireMusic incorporates autoregressive Transformer and conditional flow-matching modeling (CFM), allowing for the controllable generation of music, songs, and audio with both textual and structural music conditioning, as well as neural audio tokenizers. Currently, the toolkit supports text-to-music generation and plans to expand its capabilities to include text-to-song and text-to-audio generation in the future.
91
+
92
  ## Installation
93
 
94
  ### Clone
 
154
  ```
155
 
156
  ### Available Models
157
+ Currently, we open source the music generation models only that supporting 24KHz mono channel audio.
158
  The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
159
 
160
  | Model name | Model Links | Remarks |
161
  |------------------------------|-------------------------------------------------------|-------------|
162
+ | InspireMusic-Base | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)](https://modelscope.cn/models/iic/InspireMusic/summary) [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 24kHz mono |
163
  | InspireMusic-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Music Generation 1.5B Model, 24kHz mono |
164
  | InspireSong-Base | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation Base Model, 24kHz mono |
165
  | InspireSong-1.5B | [![model](https://img.shields.io/badge/ModelScope-Model-lightgrey.svg)]() [![model](https://img.shields.io/badge/HuggingFace-Model-lightgrey.svg)]() | Pre-trained Song Generation 1.5B Model, 24kHz mono |
 
229
  bash infer.sh
230
  ```
231
 
232
+ Here is an example code to run inference with normal mode, i.e., with flow matching model for text-to-music and music continuation tasks.
233
  ```sh
234
  pretrained_model_dir = "./pretrained_models/InspireMusic/"
235
+ for task in 'text-to-music' 'continuation'; do
236
+ python inspiremusic/bin/inference.py --task $task \
237
  --gpu 0 \
238
  --config conf/inspiremusic.yaml \
239
  --prompt_data data/test/parquet/data.list \
 
241
  --llm_model $pretrained_model_dir/llm.pt \
242
  --music_tokenizer $pretrained_model_dir/music_tokenizer \
243
  --wavtokenizer $pretrained_model_dir/wavtokenizer \
244
+ --result_dir `pwd`/exp/inspiremusic/${task}_test \
245
  --chorus verse \
246
  --min_generate_audio_seconds 8 \
247
  --max_generate_audio_seconds 30
248
+ done
249
  ```
250
 
251
+ Here is an example code to run inference with fast mode, i.e., without flow matching model for text-to-music and music continuation tasks.
252
  ```sh
253
  pretrained_model_dir = "./pretrained_models/InspireMusic/"
254
+ for task in 'text-to-music' 'continuation'; do
255
+ python inspiremusic/bin/inference.py --task $task \
256
  --gpu 0 \
257
  --config conf/inspiremusic.yaml \
258
  --prompt_data data/test/parquet/data.list \
 
260
  --llm_model $pretrained_model_dir/llm.pt \
261
  --music_tokenizer $pretrained_model_dir/music_tokenizer \
262
  --wavtokenizer $pretrained_model_dir/wavtokenizer \
263
+ --result_dir `pwd`/exp/inspiremusic/${task}_test \
 
264
  --chorus verse \
265
+ --fast True \
266
  --min_generate_audio_seconds 8 \
267
  --max_generate_audio_seconds 30
268
+ done
269
  ```
270
 
271
+ ## Roadmap
272
+
273
+ - [ ] 2024/12
274
+ - [ ] 75Hz InspireMusic base model for music generation
275
+ - [ ] Support song generation task
276
+
277
+ - [ ] 2025/01
278
+ - [ ] 75Hz InspireSong model for song generation
279
+
280
+ - [ ] 2025/02
281
+ - [ ] Support audio generation task
282
+ - [ ] 75Hz InspireMusic-1.5B model for music generation
283
+
284
+ - [ ] 2025/03
285
+ - [ ] 75Hz InspireAudio model for music and audio generation
286
+
287
+ - [ ] TBD
288
+
289
+ - [ ] 25Hz InspireMusic model
290
+ - [ ] Support 48kHz stereo audio
291
+ - [ ] Streaming inference mode support
292
+ - [ ] Support more instruction mode, multi-lingual instructions
293
+ - [ ] InspireSong trained with more multi-lingual data
294
+ - [ ] More...
295
+
296
+ ---
297
  ### Friend Links
298
  Checkout some awesome Github repositories from Speech Lab of Institute for Intelligent Computing, Alibaba Group.
299
 
300
+ <p align="center">
301
  <a href="https://github.com/modelscope/ClearerVoice-Studio" target="_blank">
302
  <img alt="Demo" src="https://img.shields.io/badge/Repo | Space-ClearVoice?labelColor=&label=ClearVoice&color=green"></a>
303
  <a href="https://github.com/FunAudioLLM/CosyVoice" target="_blank">
 
314
  <table>
315
  <tr>
316
  <td style="text-align:center;">
317
+ <a href="./asset/QR.jpg"><img alt="FunAudioLLM in DingTalk" src="https://img.shields.io/badge/FunAudioLLM-DingTalk-d9d9d9"></a>
318
  </td>
319
  <td style="text-align:center;">
320
+ <a href="./asset/QR.jpg"><img alt="InspireMusic in WeChat" src="https://img.shields.io/badge/InspireMusic-WeChat-d9d9d9"></a>
321
  </td>
322
  </tr>
323
  <tr>
 
333
  * [Github Discussion](https://github.com/FunAudioLLM/InspireMusic/discussions). Best for sharing feedback and asking questions.
334
  * [GitHub Issues](https://github.com/FunAudioLLM/InspireMusic/issues). Best for bugs you encounter using InspireMusic, and feature proposals.
335
 
336
+ ## Acknowledge
337
+
338
+ 1. We borrowed a lot of code from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice).
339
+ 3. We borrowed a lot of code from [WavTokenizer](https://github.com/jishengpeng/WavTokenizer).
340
+ 4. We borrowed a lot of code from [AcademiCodec](https://github.com/yangdongchao/AcademiCodec).
341
+ 5. We borrowed a lot of code from [FunASR](https://github.com/modelscope/FunASR).
342
+ 6. We borrowed a lot of code from [FunCodec](https://github.com/modelscope/FunCodec).
343
+ 7. We borrowed a lot of code from [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS).
344
+ 9. We borrowed a lot of code from [WeNet](https://github.com/wenet-e2e/wenet).
345
+
346
  ## Disclaimer
347
  The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.