Chong Zhang
commited on
init
Browse files
README.md
CHANGED
@@ -2,9 +2,11 @@
|
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
|
|
5 |
tags:
|
6 |
-
-
|
7 |
---
|
|
|
8 |
[//]: # (# InspireMusic)
|
9 |
<p align="center">
|
10 |
<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
|
@@ -25,12 +27,12 @@ tags:
|
|
25 |
<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
|
26 |
<img alt="Code" src="https://img.shields.io/badge/Code%20⭐-InspireMusic?labelColor=%20%237372EB&label=InspireMusic&color=%20%235462eb"></a>
|
27 |
|
28 |
-
|
29 |
-
<img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-
|
30 |
|
31 |
<a href="https://arxiv.org/abs/" target="_blank">
|
32 |
<img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
|
33 |
-
<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
|
34 |
|
35 |
[//]: # (<a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">)
|
36 |
|
@@ -69,20 +71,22 @@ tags:
|
|
69 |
|
70 |
InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
|
71 |
|
|
|
72 |
|
73 |
---
|
74 |
<a name="Highligts"></a>
|
75 |
## Highlights
|
76 |
**InspireMusic** focuses on music generation, song generation and audio generation.
|
77 |
-
- A unified framework for music/song/audio generation.
|
78 |
-
-
|
79 |
-
-
|
80 |
-
- Support
|
81 |
|
82 |
<a name="What's News"></a>
|
83 |
## What's New 🔥
|
84 |
|
85 |
-
|
|
|
86 |
- 2024/11: Welcome to preview 👉🏻 [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) 👈🏻. We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
|
87 |
- 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
|
88 |
|
@@ -156,20 +160,22 @@ If you are an expert in this field, and you are only interested in training your
|
|
156 |
``` sh
|
157 |
# git模型下载,请确保已安装git lfs
|
158 |
mkdir -p pretrained_models
|
159 |
-
git clone https://www.modelscope.cn/iic/InspireMusic.git pretrained_models/InspireMusic
|
160 |
```
|
161 |
|
162 |
### Available Models
|
163 |
-
Currently, we open source the music generation models
|
164 |
The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
|
165 |
|
166 |
-
| Model name
|
167 |
-
|
168 |
-
| InspireMusic-Base
|
169 |
-
| InspireMusic-
|
170 |
-
|
|
171 |
-
|
|
172 |
-
|
|
|
|
|
|
173 |
|
174 |
## Basic Usage
|
175 |
|
@@ -185,7 +191,7 @@ bash run.sh
|
|
185 |
|
186 |
### Training
|
187 |
|
188 |
-
Here is an example to train LLM model.
|
189 |
```sh
|
190 |
torchrun --nnodes=1 --nproc_per_node=8 \
|
191 |
--rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
|
@@ -206,7 +212,7 @@ torchrun --nnodes=1 --nproc_per_node=8 \
|
|
206 |
--fp16
|
207 |
```
|
208 |
|
209 |
-
Here is an example code to train flow matching model.
|
210 |
```sh
|
211 |
torchrun --nnodes=1 --nproc_per_node=8 \
|
212 |
--rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
|
@@ -223,8 +229,7 @@ torchrun --nnodes=1 --nproc_per_node=8 \
|
|
223 |
--prefetch 100 \
|
224 |
--pin_memory \
|
225 |
--deepspeed_config ./conf/ds_stage2.json \
|
226 |
-
--deepspeed.save_states model+optimizer
|
227 |
-
--fp16
|
228 |
```
|
229 |
|
230 |
### Inference
|
@@ -268,48 +273,11 @@ for task in 'text-to-music' 'continuation'; do
|
|
268 |
--wavtokenizer $pretrained_model_dir/wavtokenizer \
|
269 |
--result_dir `pwd`/exp/inspiremusic/${task}_test \
|
270 |
--chorus verse \
|
271 |
-
--fast
|
272 |
--min_generate_audio_seconds 8 \
|
273 |
--max_generate_audio_seconds 30
|
274 |
done
|
275 |
```
|
276 |
|
277 |
-
## Community & Discussion
|
278 |
-
* Please support our community project 🌟 by starring it on GitHub 🙏
|
279 |
-
* Welcome to join our DingTalk and WeChat groups to share and discuss algorithms, technology, and user experience feedback. You may scan the following QR codes to join our official chat groups accordingly.
|
280 |
-
|
281 |
-
<p align="center">
|
282 |
-
<table>
|
283 |
-
<tr>
|
284 |
-
<td style="text-align:center;">
|
285 |
-
<a href="./asset/QR.jpg"><img alt="FunAudioLLM in DingTalk" src="https://img.shields.io/badge/FunAudioLLM-DingTalk-d9d9d9"></a>
|
286 |
-
</td>
|
287 |
-
<td style="text-align:center;">
|
288 |
-
<a href="./asset/QR.jpg"><img alt="InspireMusic in WeChat" src="https://img.shields.io/badge/InspireMusic-WeChat-d9d9d9"></a>
|
289 |
-
</td>
|
290 |
-
</tr>
|
291 |
-
<tr>
|
292 |
-
<td style="text-align:center;">
|
293 |
-
<img alt="Light" src="./asset/dingding.png" width="68%" />
|
294 |
-
<td style="text-align:center;">
|
295 |
-
<img alt="Light" src="./asset/QR.jpg" width="58%" />
|
296 |
-
</td>
|
297 |
-
</tr>
|
298 |
-
</table>
|
299 |
-
</p>
|
300 |
-
|
301 |
-
* [Github Discussion](https://github.com/FunAudioLLM/InspireMusic/discussions). Best for sharing feedback and asking questions.
|
302 |
-
* [GitHub Issues](https://github.com/FunAudioLLM/InspireMusic/issues). Best for bugs you encounter using InspireMusic, and feature proposals.
|
303 |
-
|
304 |
-
## Acknowledge
|
305 |
-
|
306 |
-
1. We borrowed a lot of code from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice).
|
307 |
-
3. We borrowed a lot of code from [WavTokenizer](https://github.com/jishengpeng/WavTokenizer).
|
308 |
-
4. We borrowed a lot of code from [AcademiCodec](https://github.com/yangdongchao/AcademiCodec).
|
309 |
-
5. We borrowed a lot of code from [FunASR](https://github.com/modelscope/FunASR).
|
310 |
-
6. We borrowed a lot of code from [FunCodec](https://github.com/modelscope/FunCodec).
|
311 |
-
7. We borrowed a lot of code from [Matcha-TTS](https://github.com/shivammehta25/Matcha-TTS).
|
312 |
-
9. We borrowed a lot of code from [WeNet](https://github.com/wenet-e2e/wenet).
|
313 |
-
|
314 |
## Disclaimer
|
315 |
-
The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.
|
|
|
2 |
license: apache-2.0
|
3 |
language:
|
4 |
- en
|
5 |
+
pipeline_tag: text-to-audio
|
6 |
tags:
|
7 |
+
- music_generation
|
8 |
---
|
9 |
+
|
10 |
[//]: # (# InspireMusic)
|
11 |
<p align="center">
|
12 |
<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
|
|
|
27 |
<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
|
28 |
<img alt="Code" src="https://img.shields.io/badge/Code%20⭐-InspireMusic?labelColor=%20%237372EB&label=InspireMusic&color=%20%235462eb"></a>
|
29 |
|
30 |
+
<a href="https://modelscope.cn/models/iic/InspireMusic-1.5B-Long" target="_blank">
|
31 |
+
<img alt="Model" src="https://img.shields.io/badge/InspireMusic-Model-green"></a>
|
32 |
|
33 |
<a href="https://arxiv.org/abs/" target="_blank">
|
34 |
<img alt="Paper" src="https://img.shields.io/badge/arXiv-Paper-lightgrey"></a>
|
35 |
+
<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
|
36 |
|
37 |
[//]: # (<a href="https://huggingface.co/FunAudioLLM/InspireMusic-Base" target="_blank">)
|
38 |
|
|
|
71 |
|
72 |
InspireMusic is a fundamental AIGC toolkit designed for music, song, and audio generation using the PyTorch library.
|
73 |
|
74 |
+
 Please support our community project 💖 by starring it on GitHub 加⭐支持 🙏
|
75 |
|
76 |
---
|
77 |
<a name="Highligts"></a>
|
78 |
## Highlights
|
79 |
**InspireMusic** focuses on music generation, song generation and audio generation.
|
80 |
+
- A unified framework for music/song/audio generation. Controllable with text prompts, music genres, music structures, etc.
|
81 |
+
- Support text-to-music, music continuation, audio super-resolution, audio reconstruction tasks with high audio quality, with available sampling rates of 24kHz, 48kHz.
|
82 |
+
- Support long audio generation.
|
83 |
+
- Convenient fine-tuning and inference. Support mixed precision training (FP16, FP32). Provide convenient fine-tuning and inference scripts and strategies, allowing users to easily their music generation models.
|
84 |
|
85 |
<a name="What's News"></a>
|
86 |
## What's New 🔥
|
87 |
|
88 |
+
- 2025/01: Open-source [InspireMusic-Base](https://modelscope.cn/models/iic/InspireMusic/summary), [InspireMusic-Base-24kHz](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary), [InspireMusic-1.5B](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary), [InspireMusic-1.5B-24kHz](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary), [InspireMusic-1.5B-Long](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) models for music generation.
|
89 |
+
- 2024/12: Support to generate 48kHz audio with super resolution flow matching.
|
90 |
- 2024/11: Welcome to preview 👉🏻 [**InspireMusic Demos**](https://iris2c.github.io/InspireMusic) 👈🏻. We're excited to share this with you and are working hard to bring even more features and models soon. Your support and feedback mean a lot to us!
|
91 |
- 2024/11: We are thrilled to announce the open-sourcing of the **InspireMusic** [code repository](https://github.com/FunAudioLLM/InspireMusic) and [demos](https://iris2c.github.io/InspireMusic). **InspireMusic** is a unified framework for music, song, and audio generation, featuring capabilities such as text-to-music conversion, music structure, genre control, and timestamp management. InspireMusic stands out for its exceptional music generation and instruction-following abilities.
|
92 |
|
|
|
160 |
``` sh
|
161 |
# git模型下载,请确保已安装git lfs
|
162 |
mkdir -p pretrained_models
|
163 |
+
git clone https://www.modelscope.cn/iic/InspireMusic-1.5B-Long.git pretrained_models/InspireMusic
|
164 |
```
|
165 |
|
166 |
### Available Models
|
167 |
+
Currently, we open source the music generation models support 24KHz mono and 48KHz stereo audio.
|
168 |
The table below presents the links to the ModelScope and Huggingface model hub. More models will be available soon.
|
169 |
|
170 |
+
| Model name | Model Links | Remarks |
|
171 |
+
|-------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
|
172 |
+
| InspireMusic-Base-24kHz | [](https://modelscope.cn/models/iic/InspireMusic-Base-24kHz/summary) [](https://huggingface.co/FunAudioLLM/InspireMusic-Base-24kHz) | Pre-trained Music Generation Model, 24kHz mono |
|
173 |
+
| InspireMusic-Base | [](https://modelscope.cn/models/iic/InspireMusic-Base/summary) [](https://huggingface.co/FunAudioLLM/InspireMusic-Base) | Pre-trained Music Generation Model, 48kHz |
|
174 |
+
| InspireMusic-1.5B-24kHz | [](https://modelscope.cn/models/iic/InspireMusic-1.5B-24kHz/summary) [](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-24kHz) | Pre-trained Music Generation 1.5B Model, 24kHz mono |
|
175 |
+
| InspireMusic-1.5B | [](https://modelscope.cn/models/iic/InspireMusic-1.5B/summary) [](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B) | Pre-trained Music Generation 1.5B Model, 48kHz |
|
176 |
+
| InspireMusic-1.5B-Long | [](https://modelscope.cn/models/iic/InspireMusic-1.5B-Long/summary) [](https://huggingface.co/FunAudioLLM/InspireMusic-1.5B-Long) | Pre-trained Music Generation 1.5B Model, 48kHz, support long audio |
|
177 |
+
| InspireSong-1.5B | []() []() | Pre-trained Song Generation 1.5B Model, 48kHz stereo |
|
178 |
+
| InspireAudio-1.5B | []() []() | Pre-trained Audio Generation 1.5B Model, 48kHz stereo |
|
179 |
|
180 |
## Basic Usage
|
181 |
|
|
|
191 |
|
192 |
### Training
|
193 |
|
194 |
+
Here is an example to train LLM model, support FP16 training.
|
195 |
```sh
|
196 |
torchrun --nnodes=1 --nproc_per_node=8 \
|
197 |
--rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
|
|
|
212 |
--fp16
|
213 |
```
|
214 |
|
215 |
+
Here is an example code to train flow matching model, does not support FP16 training.
|
216 |
```sh
|
217 |
torchrun --nnodes=1 --nproc_per_node=8 \
|
218 |
--rdzv_id=1024 --rdzv_backend="c10d" --rdzv_endpoint="localhost:0" \
|
|
|
229 |
--prefetch 100 \
|
230 |
--pin_memory \
|
231 |
--deepspeed_config ./conf/ds_stage2.json \
|
232 |
+
--deepspeed.save_states model+optimizer
|
|
|
233 |
```
|
234 |
|
235 |
### Inference
|
|
|
273 |
--wavtokenizer $pretrained_model_dir/wavtokenizer \
|
274 |
--result_dir `pwd`/exp/inspiremusic/${task}_test \
|
275 |
--chorus verse \
|
276 |
+
--fast \
|
277 |
--min_generate_audio_seconds 8 \
|
278 |
--max_generate_audio_seconds 30
|
279 |
done
|
280 |
```
|
281 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
282 |
## Disclaimer
|
283 |
+
The content provided above is for academic purposes only and is intended to demonstrate technical capabilities. Some examples are sourced from the internet. If any content infringes on your rights, please contact us to request its removal.
|