Supplementary material for "Talking Head Anime 4: Distillation for Real-Time Performance"

1 Implementation Details

Details about model architectures, training setups, and computer used for evaluation can be found in supplementary.pdf.

2 Comparison Videos

We show comparisons between animations generated by the THA3 baseline, the teacher model, and the student models on the 3 characters used for evaluation in Section 4 of the paper below.

Character #1

Character #2

Character #3

3 Videos for qualitative evaluations against THA3 and other systems

For qualitative comparisons, we used the systems to animate pictures of anime characters in the wild.

We do not have 3D models for these characters to generate groundtruths to compute metrics or generate stick-figure videos to drive AnimateAnyone. As a result, we used a 3D model of a VTuber that do not match the identity of the characters to drive AnimateAnyone, and we used the actor video to drive LivePortrait. Identity mismatch between the image being animated and the controlling video is an inherent problem to such systems. However, THA3 and our systems circumvent this problem by controlling images through pose vectors, which are independent of identities.

3.1 Character 1

Image being animated by LivePortrait	Image being animated by other systems

Comparison between outputs generated by the systems.

3.2 Character 2

Image being animated by LivePortrait	Image being animated by other systems

Comparison between outputs generated by the systems.

3.3 Character 3

Image being animated by LivePortrait	Image being animated by other systems

Comparison between outputs generated by the systems.

4 Demos

We provide web-based two demos where the user can control characters in real-time.

The first demo allows the user to control characters by manipulating sliders.
The second demo directory allows the user to control characters with their facial movement, captured by a web camera.

The demos requires a PC with a moderately powerful GPU to run. The webcam-demo, of course, requires a web camera.