ml-talking-face / docs /article.md
deepkyu's picture
Update theme, fix error
afbc1dd
|
raw
history blame
2.82 kB
<!-- ## Why learn a new language, when your model can learn it for you?
<div style="max-width: 720px;max-height: 405px;margin: auto;">
<div style="float: none;clear: both;position: relative;padding-bottom: 56.25%;height: 0;width: 100%">
<iframe width="720" height="405" src="https://www.youtube.com/embed/toqdD1F_ZsU" title="YouTube video player" style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen>
</iframe>
</div>
</div>
### Abstract
Recent studies in talking face generation have focused on building a train-once-use-everywhere model i.e. a model that will generalize from any source speech to any target identity. A number of works have already claimed this functionality and have added that their models will also generalize to any language. However, we show, using languages from different language families, that these models do not translate well when the training language and the testing language are sufficiently different. We reduce the scope of the problem to building a language-robust talking face generation system on seen identities i.e. the target identity is the same as the training identity. In this work, we introduce a talking face generation system that will generalize to different languages. We evaluate the efficacy of our system using a multilingual text-to-speech system. We also discuss the usage of joint text-to-speech system and the talking face generation system as a neural dubber system. -->
## News
(2022.08.18.) We got the CVPR Hugging Face prize! Thank you all and special thanks to AK([@akhaliq](https://huggingface.co/akhaliq)).
<center>
<img alt="we-got-huggingface-prize" src="https://github.com/deepkyu/ml-talking-face/blob/main/docs/we-got-huggingface-prize.jpeg?raw=true" width="50%" />
</center>
<br/>
(2023.10.20.) It has been a year since the demonstration has suddenly shut down by MINDsLab (MAUM.AI for now).
And today, I'm happy to share that I have restored the demonstration in my own lambdalabs instance!
Over the past year, there have been numerous advancements in Gen AI, including multilingual TTS and talking face generation.
This demo may become "old-fashioned" at this time... but I hope that it would help other researchers taking a journey in the same field.
Now I'm using A10G instance from lambdalabs with my own expense... I'm sorry, but I don't know when it will shut down again. πŸ˜΅β€πŸ’« I'll keep you posted on the status.
<center><a href="https://www.buymeacoffee.com/deepkyu" target="_blank"><img src="https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png" alt="Buy Me A Coffee" style="height: 35px !important;width: 160px !important;" ></a></center>