# 📚 ebook2audiobook 使用Calibre和Coqui XTTS将电子书转换为包含章节和元数据的有声读物。支持可选的语音克隆和多种语言! > [!IMPORTANT] **本工具仅适用于非DRM、合法获取的电子书。** 作者对软件的任何误用或由此产生的法律后果概不负责。 请负责任地使用本工具,并遵守所有适用法律。 #### 🖥️ Web GUI界面 ![demo_web_gui](https://github.com/user-attachments/assets/85af88a7-05dd-4a29-91de-76a14cf5ef06)
点击查看Web GUI的图片 image image image
## 🌟 特征 - 📖 使用Calibre将电子书转换为文本格式。 - 📚 将电子书拆分为章节,以获得有组织的音频。 - 🎙️ 使用Coqui XTTS实现高质量的文本到语音转换。 - 🗣️ 可选择使用您自己的语音文件进行语音克隆。 - 🌍 支持多种语言(默认为英语)。 - 🖥️ 基于4GB RAM运行。 ## 🛠️ 环境要求 - Python 3.10 - `coqui-tts` Python package - Calibre (用于电子书转换) - FFmpeg (用于有声读物创作) - Optional: 用于语音克隆的自定义语音文件 ### 🔧 安装说明 1. **安装 Python 3.x** from [Python.org](https://www.python.org/downloads/). 2. **安装 Calibre**: - **Ubuntu**: `sudo apt-get install -y calibre` - **macOS**: `brew install calibre` - **Windows** (Admin Powershell): `choco install calibre` 3. **安装 FFmpeg**: - **Ubuntu**: `sudo apt-get install -y ffmpeg` - **macOS**: `brew install ffmpeg` - **Windows** (Admin Powershell): `choco install ffmpeg` 4. **可选: Install Mecab** (非拉丁语言): - **Ubuntu**: `sudo apt-get install -y mecab libmecab-dev mecab-ipadic-utf8` - **macOS**: `brew install mecab`, `brew install mecab-ipadic` - **Windows**: [mecab-website-to-install-manually](https://taku910.github.io/mecab/#download) (注:日语支持有限) 5. **安装 Python packages**: ```bash pip install coqui-tts==0.24.2 pydub nltk beautifulsoup4 ebooklib tqdm gradio==4.44.0 python -m nltk.downloader punkt python -m nltk.downloader punkt_tab ``` **For non-Latin languages**: ```bash pip install mecab mecab-python3 unidic python -m unidic download ``` ## 🌐 支持的语言 - **English (en)** - **Spanish (es)** - **French (fr)** - **German (de)** - **Italian (it)** - **Portuguese (pt)** - **Polish (pl)** - **Turkish (tr)** - **Russian (ru)** - **Dutch (nl)** - **Czech (cs)** - **Arabic (ar)** - **Chinese (zh-cn)** - **Japanese (ja)** - **Hungarian (hu)** - **Korean (ko)** 在无头模式下运行脚本时指定语言代码。 ## 🚀 使用 ### 🖥️ 启动Gradio Web界面 1. **运行脚本**: ```bash python app.py ``` 2. **打开web应用程序**: 点击终端中提供的URL访问web应用程序并转换电子书. 3. **公共链接**: 在末尾添加“--share True”,如下所示:`python app.py--share True` - **[更多参数]**: 使用`-h`参数,如`python app.py-h` ### 📝 基本的无头用法 ```bash python app.py --headless True --ebook --voice [path_to_voice_file] --language [language_code] ``` - ****: 电子书文件的路径。 - **[path_to_voice_file]**: 指定转换的语音文件,可选。 - **[language_code]**: 指定转换的语言,可选。 - **[更多参数]**: 使用 `-h` 参数,如 `python app.py -h` ### 🧩 自定义XTTS模型的无头用法 ```bash python app.py --headless True --use_custom_model True --ebook --voice --language --custom_model --custom_config --custom_vocab ``` - ****: 电子书文件的路径。 - ****: 指定转换的语音文件,可选。 - ****: 指定转换的语言,可选。 - ****: `model.pth`的路径。 - ****: `config.json`的路径。 - ****: `vocab.json`的路径。 - **[更多参数]**: 使用 `-h` 参数,如 `python app.py -h` ### 🧩 自定义XTTS Fine-Tune 模型的无头用法 🌐 ```bash python app.py --headless True --use_custom_model True --ebook --voice --language --custom_model_url ``` - ****: 电子书文件的路径。 - ****: 指定转换的语音文件,可选。 - ****: 指定转换的语言,可选。 - ****: 模型文件夹压缩包的URL路径。例如 [xtts_David_Attenborough_fine_tune](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/tree/main) `https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true` - **[更多参数]**: 使用 `-h` 参数,如 `python app.py -h` ### 🔍 详细指南,包括所有要使用的参数列表 ```bash python app.py -h ``` - 这将输出以下内容: ```bash usage: app.py [-h] [--share] [--headless [HEADLESS]] [--ebook EBOOK] [--ebooks_dir [EBOOKS_DIR]] [--voice VOICE] [--language LANGUAGE] [--device {cpu,gpu}] [--use_custom_model] [--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG] [--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL] [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY] [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P] [--speed SPEED] [--enable_text_splitting] Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion. options: -h, --help show this help message and exit --share Enable a public shareable Gradio link. Defaults to False. --headless [HEADLESS] Run in headless mode. Defaults to True if the flag is present without a value, False otherwise. --ebook EBOOK Path to the ebook file for conversion. Required in headless mode. --ebooks_dir [EBOOKS_DIR] Path to the directory containing ebooks for batch conversion. Defaults to './ebooks' if 'default' value is provided. --voice VOICE Path to the target voice file for TTS. Optional, uses a default voice if not provided. --language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to English (en). --device {cpu,gpu} Type of processor unit for the audiobook conversion. Defaults to cpu. --use_custom_model Use a custom TTS model. Defaults to False. Must be True to use custom models. --custom_model CUSTOM_MODEL Path to the custom model file (.pth). Required if using a custom model. --custom_config CUSTOM_CONFIG Path to the custom config file (config.json). Required if using a custom model. --custom_vocab CUSTOM_VOCAB Path to the custom vocab file (vocab.json). Required if using a custom model. --custom_model_url CUSTOM_MODEL_URL URL to download the custom model as a zip file. Optional, but will be used if provided. Examples include David Attenborough's model: 'https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true'. More XTTS fine-tunes can be found on my Hugging Face at 'https://huggingface.co/drewThomasson'. --temperature TEMPERATURE Temperature for the model. Defaults to 0.65. Higher temperatures lead to more creative outputs. --length_penalty LENGTH_PENALTY A length penalty applied to the autoregressive decoder. Defaults to 1.0. Not applied to custom models. --repetition_penalty REPETITION_PENALTY A penalty that prevents the autoregressive decoder from repeating itself. Defaults to 2.0. --top_k TOP_K Top-k sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 50. --top_p TOP_P Top-p sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 0.8. --speed SPEED Speed factor for the speech generation. Defaults to 1.0. --enable_text_splitting Enable splitting text into sentences. Defaults to False. Example usage: Windows: headless: ./ebook2audiobook.cmd --headless --ebook 'path_to_ebook' --voice 'path_to_voice' --language en --use_custom_model --custom_model 'model.zip' --custom_config config.json --custom_vocab vocab.json Graphic Interface: ./ebook2audiobook.cmd Linux/Mac: headless: ./ebook2audiobook.sh --headless --ebook 'path_to_ebook' --voice 'path_to_voice' --language en --use_custom_model --custom_model 'model.zip' --custom_config config.json --custom_vocab vocab.json Graphic Interface: ./ebook2audiobook.sh ```
⚠️ 遗留的旧版使用说明 ## 🚀 使用 ----> `ebook2audiobookXTTS/legacy/` ### 🖥️ Web界面 1. **运行脚本**: ```bash python custom_model_ebook2audiobookXTTS_gradio.py ``` 2. **打开web应用程序**: 单击终端中提供的URL以访问web应用程序并转换电子书。 ### 📝 基础用法 ```bash python ebook2audiobook.py [path_to_voice_file] [language_code] ``` - ****: 电子书文件的路径。 - **[path_to_voice_file]**: 指定转换的语音文件,可选。 - **[language_code]**: 指定转换的语言,可选。 ### 🧩 自定义XTTS模型 ```bash python custom_model_ebook2audiobookXTTS.py ``` - ****: 电子书文件的路径。 - ****: 指定转换的语音文件,可选。 - ****: 指定转换的语言,可选。 - ****: `model.pth`的路径。 - ****: `config.json`的路径。 - ****: `vocab.json`的路径。
### 🐳 使用Docker 您还可以使用Docker运行电子书到有声读物的转换器。这种方法确保了不同环境之间的一致性,并简化了设置。 #### 🚀 运行Docker容器 要运行Docker容器并启动Gradio接口,请使用以下命令: -仅使用CPU运行 ```powershell docker run -it --rm -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py ``` -使用GPU加速运行(仅限Nvida显卡) ```powershell docker run -it --rm --gpus all -p 7860:7860 --platform=linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py ``` 此命令将启动7860端口上的Gradio接口(localhost:7860) - 对于更多选项,如以无头模式运行docker或公开gradio链接,请在docker启动命令中的`app.py`后添加`-h`参数
在无头模式下使用docker或使用额外参数修改任何内容的示例+完整指南 ## 在无头模式下使用docker的示例 首先是docker pull的最新版本 ```bash docker pull athomasson2/ebook2audiobookxtts:huggingface ``` - 在运行此命令之前,您需要在当前目录中创建一个名为“input folder”的目录,该目录将被链接,您可以在此处放置docker镜像的输入文件 ```bash mkdir input-folder && mkdir Audiobooks ``` - 运行下面命令需要将 **YOUR_INPUT_FILE.TXT** 替换为您创建的输入文件的名称 ```bash docker run -it --rm \ -v $(pwd)/input-folder:/home/user/app/input_folder \ -v $(pwd)/Audiobooks:/home/user/app/Audiobooks \ --platform linux/amd64 \ athomasson2/ebook2audiobookxtts:huggingface \ python app.py --headless True --ebook /home/user/app/input_folder/YOUR_INPUT_FILE.TXT ``` - 应该就是这样了! - 输出Audiobooks将在Audiobook文件夹中找到,该文件夹也位于您运行此docker命令的本地目录中 ## 要获取此程序中其他参数的帮助命令,可以运行以下命令 ```bash docker run -it --rm \ --platform linux/amd64 \ athomasson2/ebook2audiobookxtts:huggingface \ python app.py -h ``` 这将输出以下内容 ```bash user/app/ebook2audiobookXTTS/input-folder -v $(pwd)/Audiobooks:/home/user/app/ebook2audiobookXTTS/Audiobooks --memory="4g" --network none --platform linux/amd64 athomasson2/ebook2audiobookxtts:huggingface python app.py -h starting... usage: app.py [-h] [--share SHARE] [--headless HEADLESS] [--ebook EBOOK] [--voice VOICE] [--language LANGUAGE] [--use_custom_model USE_CUSTOM_MODEL] [--custom_model CUSTOM_MODEL] [--custom_config CUSTOM_CONFIG] [--custom_vocab CUSTOM_VOCAB] [--custom_model_url CUSTOM_MODEL_URL] [--temperature TEMPERATURE] [--length_penalty LENGTH_PENALTY] [--repetition_penalty REPETITION_PENALTY] [--top_k TOP_K] [--top_p TOP_P] [--speed SPEED] [--enable_text_splitting ENABLE_TEXT_SPLITTING] Convert eBooks to Audiobooks using a Text-to-Speech model. You can either launch the Gradio interface or run the script in headless mode for direct conversion. options: -h, --help show this help message and exit --share SHARE Set to True to enable a public shareable Gradio link. Defaults to False. --headless HEADLESS Set to True to run in headless mode without the Gradio interface. Defaults to False. --ebook EBOOK Path to the ebook file for conversion. Required in headless mode. --voice VOICE Path to the target voice file for TTS. Optional, uses a default voice if not provided. --language LANGUAGE Language for the audiobook conversion. Options: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh-cn, ja, hu, ko. Defaults to English (en). --use_custom_model USE_CUSTOM_MODEL Set to True to use a custom TTS model. Defaults to False. Must be True to use custom models, otherwise you'll get an error. --custom_model CUSTOM_MODEL Path to the custom model file (.pth). Required if using a custom model. --custom_config CUSTOM_CONFIG Path to the custom config file (config.json). Required if using a custom model. --custom_vocab CUSTOM_VOCAB Path to the custom vocab file (vocab.json). Required if using a custom model. --custom_model_url CUSTOM_MODEL_URL URL to download the custom model as a zip file. Optional, but will be used if provided. Examples include David Attenborough's model: 'https://huggingface.co/drewThomasson/xtts_David_Attenbor ough_fine_tune/resolve/main/Finished_model_files.zip?download=tr ue'. More XTTS fine-tunes can be found on my Hugging Face at 'https://huggingface.co/drewThomasson'. --temperature TEMPERATURE Temperature for the model. Defaults to 0.65. Higher Tempatures will lead to more creative outputs IE: more Hallucinations. Lower Tempatures will be more monotone outputs IE: less Hallucinations. --length_penalty LENGTH_PENALTY A length penalty applied to the autoregressive decoder. Defaults to 1.0. Not applied to custom models. --repetition_penalty REPETITION_PENALTY A penalty that prevents the autoregressive decoder from repeating itself. Defaults to 2.0. --top_k TOP_K Top-k sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 50. --top_p TOP_P Top-p sampling. Lower values mean more likely outputs and increased audio generation speed. Defaults to 0.8. --speed SPEED Speed factor for the speech generation. IE: How fast the Narrerator will speak. Defaults to 1.0. --enable_text_splitting ENABLE_TEXT_SPLITTING Enable splitting text into sentences. Defaults to True. Example: python script.py --headless --ebook path_to_ebook --voice path_to_voice --language en --use_custom_model True --custom_model model.pth --custom_config config.json --custom_vocab vocab.json ```
#### 🖥️ Docker图形用户界面 ![demo_web_gui](https://github.com/user-attachments/assets/85af88a7-05dd-4a29-91de-76a14cf5ef06)
点击查看Web界面的图片 image image image
### 🛠️ 关于自定义XTTS模型 为更好地处理特定声音而构建的模型。查看我的Hugging Face页面 [here](https://huggingface.co/drewThomasson). 要使用自定义模型,请粘贴“Finished_model_files.zip”文件的链接,如下所示: [David Attenborough fine tuned Finished_model_files.zip](https://huggingface.co/drewThomasson/xtts_David_Attenborough_fine_tune/resolve/main/Finished_model_files.zip?download=true) 更多详细信息请访问 [Dockerfile Hub Page]([https://github.com/DrewThomasson/ebook2audiobookXTTS](https://hub.docker.com/repository/docker/athomasson2/ebook2audiobookxtts/general)). ## 🌐 微调XTTS模型 要查找已经过微调的XTTS型号,请访问[Hugging Face](https://huggingface.co/drewThomasson) 🌐. 模型搜索需要包含“xtts fine tune”的关键字。 ## 🎥 Demos https://github.com/user-attachments/assets/8486603c-38b1-43ce-9639-73757dfb1031 ## 🤗 [Huggingface space demo](https://huggingface.co/spaces/drewThomasson/ebook2audiobookXTTS) - Huggingface空间正在空闲cpu层上运行,所以预计会非常慢或超时,哈哈,只是不要给它大文件 - 最好复制空间或在本地运行。 ## 📚 支持的电子书格式 - `.epub`, `.pdf`, `.mobi`, `.txt`, `.html`, `.rtf`, `.chm`, `.lit`, `.pdb`, `.fb2`, `.odt`, `.cbr`, `.cbz`, `.prc`, `.lrf`, `.pml`, `.snb`, `.cbc`, `.rb`, `.tcr` - **最佳结果**: `.epub` 或者 `.mobi`格式可以进行自动章节检测。 ## 📂 输出 - 创建一个包含元数据和章节的“.m4b”文件。 - **例子**: ![Example](https://github.com/DrewThomasson/VoxNovel/blob/dc5197dff97252fa44c391dc0596902d71278a88/readme_files/example_in_app.jpeg)