RVC_HF / README.md
r3gm's picture
update
1397f77

A newer version of the Gradio SDK is available: 5.6.0

Upgrade
metadata
license: mit
title: Applio
sdk: gradio
colorFrom: green
colorTo: green
emoji: ๐Ÿ
startup_duration_timeout: 1h

Applio

Contributors Release Stars Fork Issues

VITS-based Voice Conversion focused on simplicity, quality and performance

๐ŸŒ Website โ€ข ๐Ÿ“š Documentation โ€ข โ˜Ž๏ธ Discord

๐Ÿ›’ Plugins โ€ข ๐Ÿ“ฆ Compiled โ€ข ๐ŸŽฎ Playground โ€ข ๐Ÿ”Ž Google Colab (UI) โ€ข ๐Ÿ”Ž Google Colab (No UI)

Content Table

Installation

Download the latest version from GitHub Releases or use the Compiled Versions.

Windows

./run-install.bat

Linux

Certain Linux-based operating systems may encounter complications with the installer. In such instances, we suggest installing the requirements.txt within a Python environment version 3.9 to 3.11.

chmod +x run-install.sh
./run-install.sh

Makefile

For platforms such as Paperspace

make run-install

Usage

Visit Applio Documentation for a detailed UI usage explanation.

Windows

./run-applio.bat

Linux

chmod +x run-applio.sh
./run-applio.sh

Makefile

For platforms such as Paperspace

make run-applio

Repository Enhancements

This repository has undergone significant enhancements to improve its functionality and maintainability:

  • Modular Codebase: Restructured codebase following a modular approach for better organization, readability, and maintenance.
  • Hop Length Implementation: Implemented hop length, courtesy of @Mangio621, boosting efficiency and performance, especially on Crepe (formerly Mangio-Crepe).
  • Translations in 30+ Languages: Added support for translations in over 30 languages, enhancing accessibility for a global audience.
  • Cross-Platform Compatibility: Ensured seamless operation across various platforms for a consistent user experience.
  • Optimized Requirements: Fine-tuned project requirements for enhanced performance and resource efficiency.
  • Streamlined Installation: Simplified installation process for a user-friendly setup experience.
  • Hybrid F0 Estimation: Introduced a personalized 'hybrid' F0 estimation method utilizing nanmedian, combining F0 calculations from various methods to achieve optimal results.
  • Easy-to-Use UI: Implemented a user-friendly interface for intuitive interaction.
  • Optimized Code & Dependencies: Enhanced code and streamlined dependencies for improved efficiency.
  • Plugin System: Introduced a plugin system for extending functionality and customization.
  • Overtraining Detector: Implemented an overtraining detector which halts training once a specified epoch limit is reached, preventing excessive training.
  • Model Search: Integrated a model search feature directly into the application interface, facilitating easy model discovery.
  • Enhancements in Pretrained Models: Introduced additional functionalities such as custom pretrained models, allowing users to utilize their preferred pretrained models without requiring RVC1 pretrained models upon installation.
  • Voice Blender: Developed a voice blender feature that combines two trained models to create a new one, offering versatility in model generation.
  • Accessibility Improvements: Enhanced accessibility with descriptive tooltips indicating the function of each element in the user interface, making it more user-friendly for all users.
  • New F0 Extraction Methods: Introduced new F0 extraction methods such as FCPE or Hybrid, expanding options for pitch extraction.
  • Output Format Selection: Implemented an output format selection feature, allowing users to choose the format in which they want to save their audio files.
  • Hashing System: Implemented a hashing system where each created model is assigned a unique ID to prevent unauthorized duplication or theft.
  • Model Download System: Added support for downloading models from various websites such as Google Drive, Yandex, Pixeldrain, Discord, Hugging Face, or Applio.org, enhancing model accessibility.
  • TTS Enhancements: Improved Text-to-Speech functionality with support for uploading TXT files, increasing flexibility in input methods.
  • Split Audio: Implemented audio splitting functionality which divides audio into segments for inference, subsequently merging them to create the final audio, resulting in faster processing times and potentially better outcomes.
  • Discord Presence: Displayed presence on Discord indicating active usage of Applio, with plans to incorporate different statuses based on activities within the application.
  • Flask Integration: Integration with Flask, initially disabled by default, allows for automatic model downloads from the web by simply clicking the Applio button next to the model download button in the settings tab.
  • Support Tab: Added a support tab enabling users to record their screen to demonstrate encountered issues, facilitating faster issue resolution by allowing users to create GitHub issues for review and troubleshooting.

These enhancements contribute to a more robust and scalable codebase, making the repository more accessible for contributors and users alike.

Commercial Usage

We follow the MIT license for this project. If you intend to use Applio for commercial purposes, please contact us first to ensure the ethical use of the tool. You can reach us at support@applio.org. Additionally, we would appreciate it if you consider making a donation to support the ongoing development and maintenance of Applio. Thank you for your cooperation and support!

References

Contributors