my-speech / docs /en /index.md
joytou's picture
init project
882ea5e

Introduction

!!! warning We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
This codebase and all models are released under the CC-BY-NC-SA-4.0 license.

Requirements

  • GPU Memory: 4GB (for inference), 8GB (for fine-tuning)
  • System: Linux, Windows

Windows Setup

Professional Windows users may consider using WSL2 or Docker to run the codebase.

# Create a python 3.10 virtual environment, you can also use virtualenv
conda create -n fish-speech python=3.10
conda activate fish-speech

# Install pytorch
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install fish-speech
pip3 install -e .

# (Enable acceleration) Install triton-windows
pip install https://github.com/AnyaCoder/fish-speech/releases/download/v0.1.0/triton_windows-0.1.0-py3-none-any.whl

Non-professional Windows users can consider the following basic methods to run the project without a Linux environment (with model compilation capabilities, i.e., torch.compile):

  1. Extract the project package.
  2. Click install_env.bat to install the environment.
  3. If you want to enable compilation acceleration, follow this step:
    1. Download the LLVM compiler from the following links:
    2. Download and install the Microsoft Visual C++ Redistributable to solve potential .dll missing issues:
    3. Download and install Visual Studio Community Edition to get MSVC++ build tools and resolve LLVM's header file dependencies:
      • Visual Studio Download
      • After installing Visual Studio Installer, download Visual Studio Community 2022.
      • As shown below, click the Modify button and find the Desktop development with C++ option to select and download.
    4. Download and install CUDA Toolkit 12.x
  4. Double-click start.bat to open the training inference WebUI management interface. If needed, you can modify the API_FLAGS as prompted below.

!!! info "Optional"

Want to start the inference WebUI? 

Edit the `API_FLAGS.txt` file in the project root directory and modify the first three lines as follows: 
```
 --infer 
 # --api 
 # --listen ...
 ...
```

!!! info "Optional"

Want to start the API server? 

Edit the `API_FLAGS.txt` file in the project root directory and modify the first three lines as follows:

``` 
# --infer
--api
--listen ...
...
```

!!! info "Optional"

Double-click `run_cmd.bat` to enter the conda/python command line environment of this project.

Linux Setup

# Create a python 3.10 virtual environment, you can also use virtualenv
conda create -n fish-speech python=3.10
conda activate fish-speech

# Install pytorch
pip3 install torch torchvision torchaudio

# Install fish-speech
pip3 install -e .[stable]

# (Ubuntu / Debian User) Install sox
apt install libsox-dev

Changelog

  • 2024/09/10: Updated Fish-Speech to 1.4 version, with an increase in dataset size and a change in the quantizer's n_groups from 4 to 8.
  • 2024/07/02: Updated Fish-Speech to 1.2 version, remove VITS Decoder, and greatly enhanced zero-shot ability.
  • 2024/05/10: Updated Fish-Speech to 1.1 version, implement VITS decoder to reduce WER and improve timbre similarity.
  • 2024/04/22: Finished Fish-Speech 1.0 version, significantly modified VQGAN and LLAMA models.
  • 2023/12/28: Added lora fine-tuning support.
  • 2023/12/27: Add gradient checkpointing, causual sampling, and flash-attn support.
  • 2023/12/19: Updated webui and HTTP API.
  • 2023/12/18: Updated fine-tuning documentation and related examples.
  • 2023/12/17: Updated text2semantic model, supporting phoneme-free mode.
  • 2023/12/13: Beta version released, includes VQGAN model and a language model based on LLAMA (phoneme support only).

Acknowledgements