Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis | ICLR 2024 Spotlight
| | δΈζζζ‘£
This is the official repo of Real3D-Portrait with Pytorch implementation, for one-shot and high video reality talking portrait synthesis. You can visit our Demo Page for watching demo videos, and read our Paper for technical details.
Quick Start!
Environment Installation
Please refer to Installation Guide, prepare a Conda environment real3dportrait
.
Download Pre-trained & Third-Party Models
3DMM BFM Model
Download 3DMM BFM Model from Google Drive or BaiduYun Disk with Password m9q5.
Put all the files in deep_3drecon/BFM
, the file structure will be like this:
deep_3drecon/BFM/
βββ 01_MorphableModel.mat
βββ BFM_exp_idx.mat
βββ BFM_front_idx.mat
βββ BFM_model_front.mat
βββ Exp_Pca.bin
βββ facemodel_info.mat
βββ index_mp468_from_mesh35709.npy
βββ mediapipe_in_bfm53201.npy
βββ std_exp.txt
Pre-trained Real3D-Portrait
Download Pre-trained Real3D-PortraitοΌGoogle Drive or BaiduYun Disk with Password 6x4f
Put the zip files in checkpoints
and unzip them, the file structure will be like this:
checkpoints/
βββ 240126_real3dportrait_orig
β βββ audio2secc_vae
β β βββ config.yaml
β β βββ model_ckpt_steps_400000.ckpt
β βββ secc2plane_torso_orig
β βββ config.yaml
β βββ model_ckpt_steps_100000.ckpt
βββ pretrained_ckpts
βββ mit_b0.pth
Inference
Currently, we provide CLI and Gradio WebUI for inference, and Google Colab will be provided in the future. We support both Audio-Driven and Video-Driven methods:
- For audio-driven, at least prepare
source image
anddriving audio
- For video-driven, at least prepare
source image
anddriving expression video
Gradio WebUI
Run Gradio WebUI demo, upload resouces in webpageοΌclick Generate
button to inferenceοΌ
python inference/app_real3dportrait.py
CLI Inference
Firstly, switch to project folder and activate conda environment:
cd <Real3DPortraitRoot>
conda activate real3dportrait
export PYTHON_PATH=./
For audio-driven, provide source image and driving audio:
python inference/real3d_infer.py \
--src_img <PATH_TO_SOURCE_IMAGE> \
--drv_aud <PATH_TO_AUDIO> \
--drv_pose <PATH_TO_POSE_VIDEO, OPTIONAL> \
--bg_img <PATH_TO_BACKGROUND_IMAGE, OPTIONAL> \
--out_name <PATH_TO_OUTPUT_VIDEO, OPTIONAL>
For video-driven, provide source image and driving expression video(as --drv_aud
parameter):
python inference/real3d_infer.py \
--src_img <PATH_TO_SOURCE_IMAGE> \
--drv_aud <PATH_TO_EXP_VIDEO> \
--drv_pose <PATH_TO_POSE_VIDEO, OPTIONAL> \
--bg_img <PATH_TO_BACKGROUND_IMAGE, OPTIONAL> \
--out_name <PATH_TO_OUTPUT_VIDEO, OPTIONAL>
Some optional parametersοΌ
--drv_pose
provide motion pose information, default to be static poses--bg_img
provide background information, default to be image extracted from source--mouth_amp
mouth amplitude, higher value leads to wider mouth--map_to_init_pose
when set toTrue
, the initial pose will be mapped to source pose, and other poses will be equally transformed--temperature
stands for the sampling temperature of audio2motion, higher for more diverse results at the expense of lower accuracy--out_name
When not assigned, the results will be stored atinfer_out/tmp/
.--out_mode
Whenfinal
, only outputs the final result; whenconcat_debug
, also outputs visualization of several intermediate process.
Commandline example:
python inference/real3d_infer.py \
--src_img data/raw/examples/Macron.png \
--drv_aud data/raw/examples/Obama_5s.wav \
--drv_pose data/raw/examples/May_5s.mp4 \
--bg_img data/raw/examples/bg.png \
--out_name output.mp4 \
--out_mode concat_debug
ToDo
- Release Pre-trained weights of Real3D-Portrait.
- Release Inference Code of Real3D-Portrait.
- Release Gradio Demo of Real3D-Portrait..
- Release Google Colab of Real3D-Portrait..
- Release Training Code of Real3D-Portrait.
Citation
If you found this repo helpful to your work, please consider cite us:
@article{ye2024real3d,
title={Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis},
author={Ye, Zhenhui and Zhong, Tianyun and Ren, Yi and Yang, Jiaqi and Li, Weichuang and Huang, Jiawei and Jiang, Ziyue and He, Jinzheng and Huang, Rongjie and Liu, Jinglin and others},
journal={arXiv preprint arXiv:2401.08503},
year={2024}
}
@article{ye2023geneface++,
title={GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation},
author={Ye, Zhenhui and He, Jinzheng and Jiang, Ziyue and Huang, Rongjie and Huang, Jiawei and Liu, Jinglin and Ren, Yi and Yin, Xiang and Ma, Zejun and Zhao, Zhou},
journal={arXiv preprint arXiv:2305.00787},
year={2023}
}
@article{ye2023geneface,
title={GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis},
author={Ye, Zhenhui and Jiang, Ziyue and Ren, Yi and Liu, Jinglin and He, Jinzheng and Zhao, Zhou},
journal={arXiv preprint arXiv:2301.13430},
year={2023}
}