AnimeGAN → [Chen et al. 2019]
Figure 1. Example of photo cartoonization with our method: left is the real-world photo used for input, right is the animation style picture output by our proposed method.

Transforming photos of real-world scenes into anime style images is a meaningful and challenging task in terms of computer vision and artistic style transfer. Our previously proposed AnimeGAN combines neural style transfer and generative adversarial network (GAN) to accomplish this task. However, AnimeGAN still has some obvious problems, such as high-frequency artifacts in the images generated by the model. Therefore, in this research, we propose an improved version of AnimeGAN, namely AnimeGANv2. It prevents the generation of high-frequency artifacts by simply changing the normalization of features in the network. In addition, we further reduce the scale of the generator network to achieve more efficient animation style transfer. AnimeGANv2 trained on the newly established high-quality dataset can generate animation images with better visual quality than AnimeGAN.

AnimeGANv2 uses layer normalization of features to prevent the network from producing high-frequency artifacts in the generated images. However, AnimeGAN is prone to generate high-frequency artifacts due to the use of instance normalization, which is the same as the reason why styleGAN generates high-frequency artifacts. In fact, total variation loss cannot completely suppress the generation of high-frequency noise. Instance normalization is generally regarded as the best normalization method in style transfer. It can make different channels in the feature map have different feature properties, thereby promoting the diversity of styles in the images generated by the model. Layer normalization can make different channels in the feature map have the same distribution of feature properties, which can effectively prevent the generation of local noise.

The network structure of the generator in AnimeGANv2 is shown in Figure 2. K represents the size of the convolution kernel, S represents the step size, C represents the number of convolution kernels, IRB represents the inverted residual block, resize represents the interpolation up-sampling method, and SUM means the element-wise addition. The generator parameter size of AnimeGANv2 is 8.6MB, and the generator parameter size of AnimeGAN is 15.8MB. AnimeGANv2 uses the same discriminator as AnimeGAN, the difference is that the discriminator uses layer normalization instead of instance normalization.

The three animation style datasets used by AnimeGANv2 are shown in Table 1. The image size used in the training is 256*256, and these style images are all from the video frames in the corresponding high-definition style movies. Figure 3 shows the qualitative results of AnimeGANv2 on the test dataset for the three animation styles.

Figure 2. Architecture of the generator network.

Table 1. Information of three anime style datasets.
Anime style Film Picture Number Quality Download Style Dataset
Miyazaki Hayao The Wind Rises 1752 1080p Link
Makoto Shinkai Your Name & Weathering with you 1445 BD
Kon Satoshi Paprika 1284 BDRip

Figure 3. The qualitative results of AnimeGANv2 on the three animation styles on the test dataset.
See Also

Jie Chen, Gang Liu, Xin Chen
"AnimeGAN: A Novel Lightweight GAN for Photo Animation."
ISICA 2019: Artificial Intelligence Algorithms and Applications pp 242-256, 2019.

「AnimeGANv2」 Xin Chen, August, 2020