File size: 5,307 Bytes
8598b7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
<div align="center">
<h1>Fish Speech</h1>

[English](../README.md) | [็ฎ€ไฝ“ไธญๆ–‡](README.zh.md) | [Portuguese](README.pt-BR.md) | [ๆ—ฅๆœฌ่ชž](README.ja.md) | **ํ•œ๊ตญ์–ด** <br>

<a href="https://www.producthunt.com/posts/fish-speech-1-4?embed=true&utm_source=badge-featured&utm_medium=badge&utm_souce=badge-fish&#0045;speech&#0045;1&#0045;4" target="_blank">
    <img src="https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=488440&theme=light" alt="Fish&#0032;Speech&#0032;1&#0046;4 - Open&#0045;Source&#0032;Multilingual&#0032;Text&#0045;to&#0045;Speech&#0032;with&#0032;Voice&#0032;Cloning | Product Hunt" style="width: 250px; height: 54px;" width="250" height="54" />
</a>
<a href="https://trendshift.io/repositories/7014" target="_blank">
    <img src="https://trendshift.io/api/badge/repositories/7014" alt="fishaudio%2Ffish-speech | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/>
</a>
<br>
</div>
<br>

<div align="center">
    <img src="https://count.getloli.com/get/@fish-speech?theme=asoul" /><br>
</div>
<br>

<div align="center">
    <a target="_blank" href="https://discord.gg/Es5qTB9BcN">
        <img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
    </a>
    <a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
        <img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
    </a>
    <a target="_blank" href="https://huggingface.co/spaces/fishaudio/fish-speech-1">
        <img alt="Huggingface" src="https://img.shields.io/badge/๐Ÿค—%20-space%20demo-yellow"/>
    </a>
</div>

์ด ์ฝ”๋“œ๋ฒ ์ด์Šค์™€ ๋ชจ๋“  ๋ชจ๋ธ์€ CC-BY-NC-SA-4.0 ๋ผ์ด์„ ์Šค์— ๋”ฐ๋ผ ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ [LICENSE](LICENSE)๋ฅผ ์ฐธ์กฐํ•˜์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.

---

## ๊ธฐ๋Šฅ

1. **Zero-shot & Few-shot TTS:** 10์ดˆ์—์„œ 30์ดˆ์˜ ์Œ์„ฑ ์ƒ˜ํ”Œ์„ ์ž…๋ ฅํ•˜์—ฌ ๊ณ ํ’ˆ์งˆ์˜ TTS ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. **์ž์„ธํ•œ ๊ฐ€์ด๋“œ๋Š” [๋ชจ๋ฒ” ์‚ฌ๋ก€](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)๋ฅผ ์ฐธ์กฐํ•˜์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.**

2. **๋‹ค๊ตญ์–ด ๋ฐ ๊ต์ฐจ ์–ธ์–ด ์ง€์›:** ๋‹ค๊ตญ์–ด ๊ฑฑ์ • ์—†์ด, ํ…์ŠคํŠธ๋ฅผ ์ž…๋ ฅ์ฐฝ์— ๋ณต์‚ฌํ•˜์—ฌ ๋ถ™์—ฌ๋„ฃ๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ํ˜„์žฌ ์˜์–ด, ์ผ๋ณธ์–ด, ํ•œ๊ตญ์–ด, ์ค‘๊ตญ์–ด, ํ”„๋ž‘์Šค์–ด, ๋…์ผ์–ด, ์•„๋ž์–ด, ์ŠคํŽ˜์ธ์–ด๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

3. **์Œ์†Œ ์˜์กด์„ฑ ์ œ๊ฑฐ:** ์ด ๋ชจ๋ธ์€ ๊ฐ•๋ ฅํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ, TTS๊ฐ€ ์Œ์†Œ์— ์˜์กดํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  ์–ธ์–ด ์Šคํฌ๋ฆฝํŠธ ํ…์ŠคํŠธ๋ฅผ ์†์‰ฝ๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4. **๋†’์€ ์ •ํ™•๋„:** ์˜์–ด ํ…์ŠคํŠธ ๊ธฐ์ค€ 5๋ถ„ ๊ธฐ์ค€์—์„œ ๋‹จ, 2%์˜ ๋ฌธ์ž ์˜ค๋ฅ˜์œจ(CER)๊ณผ ๋‹จ์–ด ์˜ค๋ฅ˜์œจ(WER)์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

5. **๋น ๋ฅธ ์†๋„:** fish-tech ๊ฐ€์†์„ ํ†ตํ•ด ์‹ค์‹œ๊ฐ„ ์ธ์ž(RTF)๋Š” Nvidia RTX 4060 ๋…ธํŠธ๋ถ์—์„œ๋Š” ์•ฝ 1:5, Nvidia RTX 4090์—์„œ๋Š” 1:15์ž…๋‹ˆ๋‹ค.

6. **์›น UI ์ถ”๋ก :** Chrome, Firefox, Edge ๋“ฑ ๋‹ค์–‘ํ•œ ๋ธŒ๋ผ์šฐ์ €์—์„œ ํ˜ธํ™˜๋˜๋Š” Gradio ๊ธฐ๋ฐ˜์˜ ์‚ฌ์šฉํ•˜๊ธฐ ์‰ฌ์šด ์›น UI๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

7. **GUI ์ถ”๋ก :** PyQt6 ๊ทธ๋ž˜ํ”ฝ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์ œ๊ณตํ•˜์—ฌ API ์„œ๋ฒ„์™€ ์›ํ™œํ•˜๊ฒŒ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. Linux, Windows ๋ฐ macOS๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. [GUI ์ฐธ์กฐ](https://github.com/AnyaCoder/fish-speech-gui).

8. **๋ฐฐํฌ ์นœํ™”์ :** Linux, Windows, macOS์—์„œ ๋„ค์ดํ‹ฐ๋ธŒ๋กœ ์ง€์›๋˜๋Š” ์ถ”๋ก  ์„œ๋ฒ„๋ฅผ ์‰ฝ๊ฒŒ ์„ค์ •ํ•  ์ˆ˜ ์žˆ์–ด ์†๋„ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•ฉ๋‹ˆ๋‹ค.

## ๋ฉด์ฑ… ์กฐํ•ญ

์ด ์ฝ”๋“œ๋ฒ ์ด์Šค์˜ ๋ถˆ๋ฒ•์  ์‚ฌ์šฉ์— ๋Œ€ํ•ด ์–ด๋– ํ•œ ์ฑ…์ž„๋„ ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค. DMCA ๋ฐ ๊ด€๋ จ ๋ฒ•๋ฅ ์— ๋Œ€ํ•œ ๋กœ์ปฌ ๋ฒ•๋ฅ ์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

## ์˜จ๋ผ์ธ ๋ฐ๋ชจ

[Fish Audio](https://fish.audio)

## ๋กœ์ปฌ ์ถ”๋ก ์„ ์œ„ํ•œ ๋น ๋ฅธ ์‹œ์ž‘

[inference.ipynb](/inference.ipynb)

## ์˜์ƒ

#### V1.4 ๋ฐ๋ชจ ์˜์ƒ: [Youtube](https://www.youtube.com/watch?v=Ghc8cJdQyKQ)

## ๋ฌธ์„œ

- [English](https://speech.fish.audio/)
- [ไธญๆ–‡](https://speech.fish.audio/zh/)
- [ๆ—ฅๆœฌ่ชž](https://speech.fish.audio/ja/)
- [Portuguese (Brazil)](https://speech.fish.audio/pt/)
- [ํ•œ๊ตญ์–ด](https://speech.fish.audio/ko/)

## Samples (2024/10/02 V1.4)

- [English](https://speech.fish.audio/samples/)
- [ไธญๆ–‡](https://speech.fish.audio/zh/samples/)
- [ๆ—ฅๆœฌ่ชž](https://speech.fish.audio/ja/samples/)
- [Portuguese (Brazil)](https://speech.fish.audio/pt/samples/)
- [ํ•œ๊ตญ์–ด](https://speech.fish.audio/ko/samples/)

## Credits

- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)

## Sponsor

<div>
  <a href="https://6block.com/">
    <img src="https://avatars.githubusercontent.com/u/60573493" width="100" height="100" alt="6Block Avatar"/>
  </a>
  <br>
  <a href="https://6block.com/">๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํ›„์›: 6Block</a>
</div>
<div>
  <a href="https://www.lepton.ai/">
    <img src="https://www.lepton.ai/favicons/apple-touch-icon.png" width="100" height="100" alt="Lepton Avatar"/>
  </a>
  <br>
  <a href="https://www.lepton.ai/">Fish Audio๋Š” Lepton.AI์—์„œ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค</a>
</div>