aiavatartest / README.md
Spanicin's picture
Upload 11 files
6764da3 verified
|
raw
history blame
14.5 kB

            Open In Colab       Hugging Face Spaces       sd webui-colab       Replicate


1 Xi'an Jiaotong University โ€ƒ 2 Tencent AI Lab โ€ƒ 3 Ant Group โ€ƒ

CVPR 2023

sadtalker

TL;DR:       single portrait image ๐Ÿ™Žโ€โ™‚๏ธ      +       audio ๐ŸŽค       =       talking head video ๐ŸŽž.


๐Ÿ”ฅ Highlight

https://user-images.githubusercontent.com/4397546/231495639-5d4bb925-ea64-4a36-a519-6389917dac29.mp4

  • ๐Ÿ”ฅ full image mode is online! checkout here for more details.
still+enhancer in v0.0.1 still + enhancer in v0.0.2 input image @bagbag1815
  • ๐Ÿ”ฅ Several new mode, eg, still mode, reference mode, resize mode are online for better and custom applications.

  • ๐Ÿ”ฅ Happy to see more community demos at bilibili, Youtube and twitter #sadtalker.

๐Ÿ“‹ Changelog (Previous changelog can be founded here)

  • [2023.04.15]: Adding automatic1111 colab by @camenduru, thanks for this awesome colab: sd webui-colab.

  • [2023.04.12]: adding a more detailed sd-webui installation document, fixed reinstallation problem.

  • [2023.04.12]: Fixed the sd-webui safe issues becasue of the 3rd packages, optimize the output path in sd-webui-extension.

  • [2023.04.08]: โ—๏ธโ—๏ธโ—๏ธ In v0.0.2, we add a logo watermark to the generated video to prevent abusing since it is very realistic.

  • [2023.04.08]: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer.

๐Ÿšง TODO

Previous TODOs
  • Generating 2D face from a single Image.

  • Generating 3D face from Audio.

  • Generating 4D free-view talking examples from audio and a single image.

  • Gradio/Colab Demo.

  • Full body/image Generation.

  • integrade with stable-diffusion-web-ui. (stay tunning!)

  • Audio-driven Anime Avatar.

  • training code of each componments.

  • If you have any problem, please view our FAQ before opening an issue.

    โš™๏ธ 1. Installation.

    Tutorials from communities: ไธญๆ–‡windowsๆ•™็จ‹ | ๆ—ฅๆœฌ่ชžใ‚ณใƒผใ‚น

    Linux:

    1. Installing anaconda, python and git.

    2. Creating the env and install the requirements.

    git clone https://github.com/Winfredy/SadTalker.git
    
    cd SadTalker 
    
    conda create -n sadtalker python=3.8
    
    conda activate sadtalker
    
    pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
    
    conda install ffmpeg
    
    pip install -r requirements.txt
    
    ### tts is optional for gradio demo. 
    ### pip install TTS
    

    Windows (ไธญๆ–‡windowsๆ•™็จ‹):

    1. Install Python 3.10.6, checking "Add Python to PATH".
    2. Install git manually (OR scoop install git via scoop).
    3. Install ffmpeg, following this instruction (OR using scoop install ffmpeg via scoop).
    4. Download our SadTalker repository, for example by running git clone https://github.com/Winfredy/SadTalker.git.
    5. Download the checkpoint and gfpgan belowโ†“.
    6. Run start.bat from Windows Explorer as normal, non-administrator, user, a gradio WebUI demo will be started.

    Macbook:

    More tips about installnation on Macbook and the Docker file can be founded here

    ๐Ÿ“ฅ 2. Download Trained Models.

    You can run the following script to put all the models in the right place.

    bash scripts/download_models.sh
    

    Other alternatives:

    we also provide an offline patch (gfpgan/), thus, no model will be downloaded when generating.

    Google Driver: download our pre-trained model from this link (main checkpoints) and gfpgan (offline patch)

    Github Release Page: download all the files from the lastest github release page, and then, put it in ./checkpoints.

    ็™พๅบฆไบ‘็›˜: we provided the downloaded model in checkpoints, ๆๅ–็ : sadt. And gfpgan, ๆๅ–็ : sadt.

    Model Details

    The final folder will be shown as:

    image

    Model explains:

    Model Description
    checkpoints/auido2exp_00300-model.pth Pre-trained ExpNet in Sadtalker.
    checkpoints/auido2pose_00140-model.pth Pre-trained PoseVAE in Sadtalker.
    checkpoints/mapping_00229-model.pth.tar Pre-trained MappingNet in Sadtalker.
    checkpoints/mapping_00109-model.pth.tar Pre-trained MappingNet in Sadtalker.
    checkpoints/facevid2vid_00189-model.pth.tar Pre-trained face-vid2vid model from the reappearance of face-vid2vid.
    checkpoints/epoch_20.pth Pre-trained 3DMM extractor in Deep3DFaceReconstruction.
    checkpoints/wav2lip.pth Highly accurate lip-sync model in Wav2lip.
    checkpoints/shape_predictor_68_face_landmarks.dat Face landmark model used in dilb.
    checkpoints/BFM 3DMM library file.
    checkpoints/hub Face detection models used in face alignment.
    gfpgan/weights Face detection and enhanced models used in facexlib and gfpgan.

    ๐Ÿ”ฎ 3. Quick Start (Best Practice).

    WebUI Demos:

    Online: Huggingface | SDWebUI-Colab | Colab

    Local Autiomatic1111 stable-diffusion webui extension: please refer to Autiomatic1111 stable-diffusion webui docs.

    Local gradio demo: Similar to our hugging-face demo can be run by:

    ## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
    python app.py
    

    Local windows gradio demo: just double click webui.bat, the requirements will be installed automatically.

    Manually usages:

    Animating a portrait image from default config:
    python inference.py --driven_audio <audio.wav> \
                        --source_image <video.mp4 or picture.png> \
                        --enhancer gfpgan 
    

    The results will be saved in results/$SOME_TIMESTAMP/*.mp4.

    Full body/image Generation:

    Using --still to generate a natural full body video. You can add enhancer to improve the quality of the generated video.

    python inference.py --driven_audio <audio.wav> \
                        --source_image <video.mp4 or picture.png> \
                        --result_dir <a file to store results> \
                        --still \
                        --preprocess full \
                        --enhancer gfpgan 
    

    More examples and configuration and tips can be founded in the >>> best practice documents <<<.

    ๐Ÿ›Ž Citation

    If you find our work useful in your research, please consider citing:

    @article{zhang2022sadtalker,
      title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
      author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
      journal={arXiv preprint arXiv:2211.12194},
      year={2022}
    }
    

    ๐Ÿ’— Acknowledgements

    Facerender code borrows heavily from zhanglonghao's reproduction of face-vid2vid and PIRender. We thank the authors for sharing their wonderful code. In training process, We also use the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.

    See also these wonderful 3rd libraries we use:

    ๐Ÿฅ‚ Extensions:

    ๐Ÿฅ‚ Related Works

    ๐Ÿ“ข Disclaimer

    This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.

    LOGO: color and font suggestion: ChatGPT, logo font๏ผšMontserrat Alternates .

    All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.