A Novel Double-Tail Generative Adversarial Network for Fast Photo Animation

1Hubei University of Technology, 2Canton, 3Wuhan University
IEICE Trans. Information and Systems

The pipeline of DTGAN.
AnimeGANv3 efficiently converts photos into cartoon images.

Abstract

Photo animation is to transform photos of real-world scenes into anime style images, which is a challenging task in AIGC (AI Generated Content). Although previous methods have achieved promising results, they often introduce noticeable artifacts or distortions.

In this paper, we propose a novel double-tail generative adversarial network (DTGAN) for fast photo animation. DTGAN is the third version of the AnimeGAN series. Therefore, DTGAN is also called AnimeGANv3. The generator of DTGAN has two output tails, a support tail for outputting coarse-grained anime style images and a main tail for refining coarse-grained anime style images. In DTGAN, we propose a novel learnable normalization technique, termed as linearly adaptive denormalization (LADE), to prevent artifacts in the generated images. In order to improve the visual quality of the generated anime style images, two novel loss functions suitable for photo animation are proposed: 1) the region smoothing loss function, which is used to weaken the texture details of the generated images to achieve anime effects with abstract details; 2) the fine-grained revision loss function, which is used to eliminate artifacts and noise in the generated anime style image while preserving clear edges. Furthermore, the generator of DTGAN is a lightweight generator framework with only 1.02 million parameters in the inference phase. The proposed DTGAN can be easily end-to-end trained with unpaired training data.

Extensive experiments have been conducted to qualitatively and quantitatively demonstrate that our method can produce high-quality anime style images from real-world photos and perform better than the state-of-the-art models.

The architecture of generator network and discriminator network.

Fine-grained revision

Although the two outputs of the support branch still have significant visual quality issues, after revision using the NL-means method and the L0 Smoothing method, the output of the main tail presents a high-definition cartoon image.


LADE

The images produced by BN and LN have a large number of cracks. The images generated by IN and GN suffer from obvious visual artifacts. This demonstrate that BN, LN, IN and GN do not solve anime style transfer well. The proposed LADE has superior anime stylization performance, and it can also be plug-and-play like other normalization methods.

Visualization

Use the AnimeGANv3 to transform the beauty of European and American style into Hayao Miyazaki style animation. With the wonderful BGM, it brings a comfortable and decompressive viewing experience.

Photo to Hayao Style


Photo to Shinkai Style


Photo to Hayao Style

Photo to Shinkai Style

Comparison

Comarison with current SOTA baselines. Zoom in for better visualization.

Performance

Our method achieves the lowest scores on both FID to anime image distribution and KID to anime image distribution, which proves DTGAN generates results most similar to anime images. AnimeGANv2 outperforms all methods on FID to photo distribution and KID to photo distribution, which indicates that AnimeGANv2 preserve more the content of photos than other methods. However, it is not as good as our approach in anime style. Our model can process a 1920 × 1080 image on GPU within only 115.50 ms, which is the fastest inference speed among all methods. Moreover, DTGAN has the smallest model size among all methods.

BibTeX

@article{Liu2024dtgan,
  title={A Novel Double-Tail Generative Adversarial Network for Fast Photo Animation},
  author={Gang LIU and Xin CHEN and Zhixiang GAO},
  journal={IEICE Transactions on Information and Systems},
  volume={E107.D},
  number={1},
  pages={72-82},
  year={2024},
  doi={10.1587/transinf.2023EDP7061}
}