voice style control quality issues
Hi,
very cool work. The example demos sound very impressive e.g. Elon's voice. I have followed examples in ref [1] and was able to run things with no issues. I then attempted to experiment with adding a few reference voices e.g. David Attenborough and Morgan Freeman based on samples found here [2] and here [3] (>30s high quality audio recordings of their voices). My sample code setup below follows ref [1] and generates outputs with no issues. However, the output generated does not correspond with references for either voice. Are there additional things to consider like length of the reference recording, base speaker tts and other params? Or is it something rather silly?
Sample code:
reference_speaker = 'OpenVoice/resources/morgan_freeman_example.mp3'
target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, target_dir='processed', vad=True)
save_path = f'{output_dir}/output_morgan_freeman.wav'
Run the base speaker tts
text = "Hello, I am Morgan Freeman, and you are inside the Matrix."
src_path = f'{output_dir}/tmp.wav'
base_speaker_tts.tts(text, src_path, speaker='default', language='English', speed=1.0)
Run the tone color converter
encode_message = "@MyShell"
tone_color_converter.convert(
audio_src_path=src_path,
src_se=source_se,
tgt_se=target_se,
output_path=save_path,
message=encode_message)
References:
[1] https://github.com/myshell-ai/OpenVoice/blob/main/demo_part1.ipynb
[2] https://en.wikipedia.org/wiki/File:Sir_David_Attenborough_BBC_Radio4_Desert_Island_Discs_29_Jan_2012_b01b8yy0.flac
[3] https://en.wikipedia.org/wiki/File:Morgan_freeman_bbc_radio4_film_programme_12_09_2008_b00dbcdn.flac
Also having this issue using high-quality audio samples that are around 3 minutes in length. The final output does not match nearly as well as the demo voices do.
What about shorter reference length? Asking this because the ones used as examples seem to be 10s at most.