File size: 2,951 Bytes
90f7c1e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
<img src="img\cover.png">

# Diff-Pitcher (PyTorch)

Official Pytorch Implementation of [Diff-Pitcher: Diffusion-based Singing Voice Pitch Correction](https://engineering.jhu.edu/lcap/data/uploads/pdfs/waspaa2023_hai.pdf)

--------------------

Thank you all for your interest in this research project. I am currently optimizing the model's performance and computation efficiency. I plan to release a user-friendly version, either a GUI or a VST, in the first half of this year, and will update the open-source license.

If you are familiar with PyTorch, you can follow [Code Examples](#examples) to use Diff-Pitcher.

--------------------

Diff-Pitcher

- [Demo Page](#demo)
- [Todo List](#todo)
- [Code Examples](#examples)
- [References](#references)
- [Acknowledgement](#acknowledgement)

## Demo

๐ŸŽต Listen to [examples](https://jhu-lcap.github.io/Diff-Pitcher/)

## Todo
- [x] Update codes and demo
- [x] Support ๐Ÿค— [Diffusers](https://github.com/huggingface/diffusers)
- [x] Upload checkpoints
- [x] Pipeline tutorial
- [ ] Merge to [Your-Stable-Audio](https://github.com/haidog-yaqub/Your-Stable-Audio)
- [ ] Audio Plugin Support
## Examples
- Download checkpoints: ๐ŸŽ’[ckpts](https://github.com/haidog-yaqub/DiffPitcher/tree/main/ckpts)
- Prepare environment: [requirements.txt](requirements.txt)
- Feel free to try:
  - template-based automatic pitch correction: [template_based_apc.py](template_based_apc.py)
  - score-based automatic pitch correction: [score_based_apc.py](score_based_apc.py)


## References

If you find the code useful for your research, please consider citing:

```bibtex
@inproceedings{hai2023diff,
  title={Diff-Pitcher: Diffusion-Based Singing Voice Pitch Correction},
  author={Hai, Jiarui and Elhilali, Mounya},
  booktitle={2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
  pages={1--5},
  year={2023},
  organization={IEEE}
}
```

This repo is inspired by:

```bibtex
@article{popov2021diffusion,
  title={Diffusion-based voice conversion with fast maximum likelihood sampling scheme},
  author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail and Wei, Jiansheng},
  journal={arXiv preprint arXiv:2109.13821},
  year={2021}
}
```
```bibtex
@inproceedings{liu2022diffsinger,
  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Zhao, Zhou},
  booktitle={Proceedings of the AAAI conference on artificial intelligence},
  volume={36},
  number={10},
  pages={11020--11028},
  year={2022}
}
```

## Acknowledgement

[Welcome to LCAP! < LCAP (jhu.edu)](https://engineering.jhu.edu/lcap/)

We borrow code from following repos:

 - `Diffusion Schedulers` are based on ๐Ÿค— [Diffusers](https://github.com/huggingface/diffusers)
 - `2D UNet` is based on [DiffVC](https://github.com/huawei-noah/Speech-Backbones/tree/main/DiffVC)