# CODE_OF_CONDUCT.md
# CONTRIBUTING.md
- LICENSE.md +159 -0
# README.md
- Untitled.ipynb +0 -0
- doc/__init__.py +0 -0
- doc/landmark_adjust.png +0 -0
- doc/landmark_closemouth.png +0 -0
- doc/landmark_delauney.png +0 -0
- doc/teaser.png +0 -0
- examples/327-3275260_leonardo-dicaprio-png-famous-actor.png +0 -0
- examples/M6_04_16k_av.mp4 +0 -0
- examples/angelina.jpg +0 -0
- examples/anne.jpg +0 -0
- examples/anne2.jpg +0 -0
- examples/audrey.jpg +0 -0
- examples/aya.jpg +0 -0
- examples/babam.jpg +0 -0
- examples/babam_pred_fls_cumhur_basbakan_audio_embed.mp4 +3 -0
- examples/batu.jpg +0 -0
- examples/batu_pred_fls_cumhur_basbakan_audio_embed.mp4 +3 -0
- examples/batu_pred_fls_kutlu_dumble_audio_embed.mp4 +0 -0
- examples/ben.jpg +0 -0
- examples/captain.jpg +0 -0
- examples/captain2.jpg +0 -0
- examples/cesi.jpg +0 -0
- examples/chris.jpg +0 -0
- examples/chris2.jpg +0 -0
- examples/ckpt/ckpt_116_i2i_comb.pth +3 -0
- examples/ckpt/ckpt_autovc.pth +3 -0
- examples/ckpt/ckpt_content_branch.pth +3 -0
- examples/ckpt/ckpt_speaker_branch.pth +3 -0
- examples/cumhur_basbakan_av.mp4 +0 -0
- examples/dali.jpg +0 -0
- examples/dali_pred_fls_cumhur_basbakan_audio_embed.mp4 +3 -0
- examples/donald.jpg +0 -0
- examples/dragonmom.jpg +0 -0
- examples/dump/emb.pickle +3 -0
- examples/dump/random_val_au.pickle +3 -0
- examples/dump/random_val_fl.pickle +3 -0
- examples/dump/random_val_gaze.pickle +3 -0
- examples/dwayne.jpg +0 -0
- examples/dwayne2.jpg +0 -0
- examples/dwayne3.jpg +0 -0
- examples/erto.jpg +0 -0
- examples/erto_pred_fls_cumhur_basbakan_audio_embed.mp4 +3 -0
- examples/erto_pred_fls_kutlu_dumble_audio_embed.mp4 +0 -0
@@ -33,3 +33,16 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
36 |
examples_cartoon/napkin_bg.png filter=lfs diff=lfs merge=lfs -text
37 |
examples_cartoon/napkin.png filter=lfs diff=lfs merge=lfs -text
38 |
examples/babam_pred_fls_cumhur_basbakan_audio_embed.mp4 filter=lfs diff=lfs merge=lfs -text
39 |
examples/batu_pred_fls_cumhur_basbakan_audio_embed.mp4 filter=lfs diff=lfs merge=lfs -text
40 |
examples/dali_pred_fls_cumhur_basbakan_audio_embed.mp4 filter=lfs diff=lfs merge=lfs -text
41 |
examples/erto_pred_fls_cumhur_basbakan_audio_embed.mp4 filter=lfs diff=lfs merge=lfs -text
42 |
examples/kut_pred_fls_cumhur_basbakan_audio_embed.mp4 filter=lfs diff=lfs merge=lfs -text
43 |
examples/kut_pred_fls_M6_04_16k_audio_embed.mp4 filter=lfs diff=lfs merge=lfs -text
44 |
examples/paint_boy_pred_fls_M6_04_16k_audio_embed.mp4 filter=lfs diff=lfs merge=lfs -text
45 |
out.mp4 filter=lfs diff=lfs merge=lfs -text
46 |
thirdparty/AdaptiveWingLoss/images/wflw_table.png filter=lfs diff=lfs merge=lfs -text
47 |
thirdparty/AdaptiveWingLoss/images/wflw.png filter=lfs diff=lfs merge=lfs -text
48 |
thirdparty/face_of_art/old/teaser.png filter=lfs diff=lfs merge=lfs -text
@@ -0,0 +1,74 @@
1 |
# Adobe Code of Conduct
2 |
3 |
## Our Pledge
4 |
5 |
In the interest of fostering an open and welcoming environment, we as
6 |
contributors and maintainers pledge to making participation in our project and
7 |
our community a harassment-free experience for everyone, regardless of age, body
8 |
size, disability, ethnicity, gender identity and expression, level of experience,
9 |
nationality, personal appearance, race, religion, or sexual identity and
10 |
11 |
12 |
## Our Standards
13 |
14 |
Examples of behavior that contributes to creating a positive environment
15 |
16 |
17 |
* Using welcoming and inclusive language.
18 |
* Being respectful of differing viewpoints and experiences.
19 |
* Gracefully accepting constructive criticism.
20 |
* Focusing on what is best for the community.
21 |
* Showing empathy towards other community members.
22 |
23 |
Examples of unacceptable behavior by participants include:
24 |
25 |
* The use of sexualized language or imagery and unwelcome sexual attention or
26 |
27 |
* Trolling, insulting/derogatory comments, and personal or political attacks.
28 |
* Public or private harassment.
29 |
* Publishing others' private information, such as a physical or electronic
30 |
address, without explicit permission.
31 |
* Other conduct which could reasonably be considered inappropriate in a
32 |
professional setting.
33 |
34 |
## Our Responsibilities
35 |
36 |
Project maintainers are responsible for clarifying the standards of acceptable
37 |
behavior and are expected to take appropriate and fair corrective action in
38 |
response to any instances of unacceptable behavior.
39 |
40 |
Project maintainers have the right and responsibility to remove, edit, or
41 |
reject comments, commits, code, wiki edits, issues, and other contributions
42 |
that are not aligned to this Code of Conduct, or to ban temporarily or
43 |
permanently any contributor for other behaviors that they deem inappropriate,
44 |
threatening, offensive, or harmful.
45 |
46 |
## Scope
47 |
48 |
This Code of Conduct applies both within project spaces and in public spaces
49 |
when an individual is representing the project or its community. Examples of
50 |
representing a project or community include using an official project e-mail
51 |
address, posting via an official social media account, or acting as an appointed
52 |
representative at an online or offline event. Representation of a project may be
53 |
further defined and clarified by project maintainers.
54 |
55 |
## Enforcement
56 |
57 |
Instances of abusive, harassing, or otherwise unacceptable behavior may be
58 |
reported by contacting the project team at Grp-opensourceoffice@adobe.com. All
59 |
complaints will be reviewed and investigated and will result in a response that
60 |
is deemed necessary and appropriate to the circumstances. The project team is
61 |
obligated to maintain confidentiality with regard to the reporter of an incident.
62 |
Further details of specific enforcement policies may be posted separately.
63 |
64 |
Project maintainers who do not follow or enforce the Code of Conduct in good
65 |
faith may face temporary or permanent repercussions as determined by other
66 |
members of the project's leadership.
67 |
68 |
## Attribution
69 |
70 |
This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71 |
available at [https://contributor-covenant.org/version/1/4][version].
72 |
73 |
[homepage]: https://contributor-covenant.org
74 |
[version]: https://contributor-covenant.org/version/1/4/
1 |
2 |
3 |
1 |
# MakeItTalk: Speaker-Aware Talking-Head Animation
2 |
3 |
This is the code repository implementing the paper:
4 |
5 |
> **MakeItTalk: Speaker-Aware Talking-Head Animation**
6 |
7 |
> [Yang Zhou](https://people.umass.edu/~yangzhou),
8 |
> [Xintong Han](http://users.umiacs.umd.edu/~xintong/),
9 |
> [Eli Shechtman](https://research.adobe.com/person/eli-shechtman),
10 |
> [Jose Echevarria](http://www.jiechevarria.com) ,
11 |
> [Evangelos Kalogerakis](https://people.cs.umass.edu/~kalo/),
12 |
> [Dingzeyu Li](https://dingzeyu.li)
13 |
14 |
> SIGGRAPH Asia 2020
15 |
16 |
> **Abstract** We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures.
17 |
In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.
18 |
19 |
> [[Project page]](https://people.umass.edu/~yangzhou/MakeItTalk/)
20 |
> [[Paper]](https://people.umass.edu/~yangzhou/MakeItTalk/MakeItTalk_SIGGRAPH_Asia_Final_round-5.pdf)
21 |
> [[Video]](https://www.youtube.com/watch?v=OU6Ctzhpc6s)
22 |
> [[Arxiv]](https://arxiv.org/abs/2004.12992)
23 |
> [[Colab Demo]](quick_demo.ipynb)
24 |
> [[Colab Demo TDLR]](quick_demo_tdlr.ipynb)
25 |
26 |

27 |
28 |
Figure. Given an audio speech signal and a single portrait image as input (left), our model generates speaker-aware talking-head animations (right).
29 |
Both the speech signal and the input face image are not observed during the model training process.
30 |
Our method creates both non-photorealistic cartoon animations (top) and natural human face videos (bottom).
31 |
32 |
## Updates
33 |
34 |
- [x] Generate new puppet! (tested on Ubuntu)
35 |
- [x] Pre-trained models
36 |
- [x] Google colab quick demo for natural faces [[detail]](quick_demo.ipynb) [[TDLR]](quick_demo_tdlr.ipynb)
37 |
- [ ] Training code for each module
38 |
39 |
## Requirements
40 |
- Python environment 3.6
41 |
42 |
conda create -n makeittalk_env python=3.6
43 |
conda activate makeittalk_env
44 |
45 |
- ffmpeg (https://ffmpeg.org/download.html)
46 |
47 |
sudo apt-get install ffmpeg
48 |
49 |
- python packages
50 |
51 |
pip install -r requirements.txt
52 |
53 |
- `winehq-stable` for cartoon face warping in Ubuntu (https://wiki.winehq.org/Ubuntu). Tested on Ubuntu16.04, wine==5.0.3.
54 |
55 |
sudo dpkg --add-architecture i386
56 |
wget -nc https://dl.winehq.org/wine-builds/winehq.key
57 |
sudo apt-key add winehq.key
58 |
sudo apt-add-repository 'deb https://dl.winehq.org/wine-builds/ubuntu/ xenial main'
59 |
sudo apt update
60 |
sudo apt install --install-recommends winehq-stable
61 |
62 |
63 |
## Pre-trained Models
64 |
65 |
Download the following pre-trained models to `examples/ckpt` folder for testing your own animation.
66 |
67 |
| Model | Link to the model |
68 |
| :-------------: | :---------------: |
69 |
| Voice Conversion | [Link](https://drive.google.com/file/d/1ZiwPp_h62LtjU0DwpelLUoodKPR85K7x/view?usp=sharing) |
70 |
| Speech Content Module | [Link](https://drive.google.com/file/d/1r3bfEvTVl6pCNw5xwUhEglwDHjWtAqQp/view?usp=sharing) |
71 |
| Speaker-aware Module | [Link](https://drive.google.com/file/d/1rV0jkyDqPW-aDJcj7xSO6Zt1zSXqn1mu/view?usp=sharing) |
72 |
| Image2Image Translation Module | [Link](https://drive.google.com/file/d/1i2LJXKp-yWKIEEgJ7C6cE3_2NirfY_0a/view?usp=sharing) |
73 |
| Non-photorealistic Warping (.exe) | [Link](https://drive.google.com/file/d/1rlj0PAUMdX8TLuywsn6ds_G6L63nAu0P/view?usp=sharing) |
74 |
75 |
## Animate You Portraits!
76 |
77 |
- Download pre-trained embedding [[here]](https://drive.google.com/file/d/18-0CYl5E6ungS3H4rRSHjfYvvm-WwjTI/view?usp=sharing) and save to `examples/dump` folder.
78 |
79 |
### _Nature Human Faces / Paintings_
80 |
81 |
- crop your portrait image into size `256x256` and put it under `examples` folder with `.jpg` format.
82 |
Make sure the head is almost in the middle (check existing examples for a reference).
83 |
84 |
- put test audio files under `examples` folder as well with `.wav` format.
85 |
86 |
- animate!
87 |
88 |
89 |
python main_end2end.py --jpg <portrait_file>
90 |
91 |
92 |
- use addition args `--amp_lip_x <x> --amp_lip_y <y> --amp_pos <pos>`
93 |
to amply lip motion (in x/y-axis direction) and head motion displacements, default values are `<x>=2., <y>=2., <pos>=.5`
94 |
95 |
96 |
97 |
### _Cartoon Faces_
98 |
99 |
- put test audio files under `examples` folder as well with `.wav` format.
100 |
101 |
- animate one of the existing puppets
102 |
103 |
| Puppet Name | wilk | smiling_person | sketch | color | cartoonM | danbooru1 |
104 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
105 |
| Image |  |  |  |  |  |  |
106 |
107 |
108 |
python main_end2end_cartoon.py --jpg <cartoon_puppet_name_with_extension> --jpg_bg <puppet_background_with_extension>
109 |
110 |
111 |
- `--jpg_bg` takes a same-size image as the background image to create the animation, such as the puppet's body, the overall fixed background image. If you want to use the background, make sure the puppet face image (i.e. `--jpg` image) is in `png` format and is transparent on the non-face area. If you don't need any background, please also create a same-size image (e.g. a pure white image) to hold the argument place.
112 |
113 |
- use addition args `--amp_lip_x <x> --amp_lip_y <y> --amp_pos <pos>`
114 |
to amply lip motion (in x/y-axis direction) and head motion displacements, default values are `<x>=2., <y>=2., <pos>=.5`
115 |
116 |
### _Generate Your New Puppet_
117 |
118 |
- put the cartoon image under `examples_cartoon`
119 |
120 |
- install conda environment `foa_env_py2` (tested on python 2) for Face-of-art (https://github.com/papulke/face-of-art).
121 |
Download the pre-trained weight [here](https://www.dropbox.com/sh/hrxcyug1bmbj6cs/AAAxq_zI5eawcLjM8zvUwaXha?dl=0) and put it under `examples/ckpt`.
122 |
Activate the environment.
123 |
124 |
125 |
source activate foa_env_py2
126 |
127 |
128 |
- create necessary files to animate your cartoon image, i.e.
129 |
`<your_puppet>_open_mouth.txt`, `<your_puppet>_close_mouth.txt`, `<your_puppet>_open_mouth_norm.txt`, `<your_puppet>_scale_shift.txt`, `<your_puppet>_delauney.txt`
130 |
131 |
132 |
python main_gen_new_puppet.py <your_puppet_with_file_extension>
133 |
134 |
135 |
- in details, it takes 3 steps
136 |
- Face-of-art automatic cartoon landmark detection.
137 |
- If it's wrong or not accurate, you can use our tool to drag and refine the landmarks.
138 |
- Estimate the closed mouth landmarks to serve as network input.
139 |
- Delauney triangulate the image with landmarks.
140 |
141 |
- check puppet name `smiling_person_example.png` for an example.
142 |
143 |
|  |  | 
144 |
| :---: | :---: | :---: |
145 |
| Landmark Adjustment Tool | Closed lips estimation | Delaunay Triangulation |
146 |
147 |
## Train
148 |
149 |
### Train Voice Conversion Module
150 |
151 |
152 |
### Train Content Branch
153 |
- Create dataset root directory `<root_dir>`
154 |
155 |
- Dataset: Download preprocessed dataset [[here]](https://drive.google.com/drive/folders/1EwuAy3j1b9Zc1MsidUfxG_pJGc_cV60O?usp=sharing), and put it under `<root_dir>/dump`.
156 |
157 |
- Train script: Run script below. Models will be saved in `<root_dir>/ckpt/<train_instance_name>`.
158 |
159 |
```shell script
160 |
python main_train_content.py --train --write --root_dir <root_dir> --name <train_instance_name>
161 |
162 |
163 |
### Train Speaker-Aware Branch
164 |
165 |
166 |
### Train Image-to-Image Translation
167 |
168 |
169 |
170 |
## [License](LICENSE.md)
171 |
172 |
## Acknowledgement
173 |
174 |
We would like to thank Timothy Langlois for the narration, and
175 |
[Kaizhi Qian](https://scholar.google.com/citations?user=uEpr4C4AAAAJ&hl=en)
176 |
for the help with the [voice conversion module](https://auspicious3000.github.io/icassp-2020-demo/).
177 |
We thank [Jakub Fiser](https://research.adobe.com/person/jakub-fiser/) for implementing the real-time GPU version of the triangle morphing algorithm.
178 |
We thank Daichi Ito for sharing the caricature image and Dave Werner
179 |
for Wilk, the gruff but ultimately lovable puppet.
180 |
181 |
This research is partially funded by NSF (EAGER-1942069)
182 |
and a gift from Adobe. Our experiments were performed in the
183 |
UMass GPU cluster obtained under the Collaborative Fund managed
184 |
by the MassTech Collaborative.
185 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
