Dense feature pyramid network for cartoon dog parsing
- PDF / 1,443,781 Bytes
- 13 Pages / 595.276 x 790.866 pts Page_size
- 6 Downloads / 173 Views
ORIGINAL ARTICLE
Dense feature pyramid network for cartoon dog parsing Jerome Wan1 · Guillaume Mougeot1 · Xubo Yang1
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract While traditional cartoon character drawings are simple for humans to create, it remains a highly challenging task for machines to interpret. Parsing is a way to alleviate the issue with fine-grained semantic segmentation of images. Although well studied on naturalistic images, research toward cartoon parsing is very sparse. Due to the lack of available dataset and the diversity of artwork styles, the difficulty of the cartoon character parsing task is greater than the well-known human parsing task. In this paper, we study one type of cartoon instance: cartoon dogs. We introduce a novel dataset toward cartoon dog parsing and create a new deep convolutional neural network (DCNN) to tackle the problem. Our dataset contains 965 precisely annotated cartoon dog images with seven semantic part labels. Our new model, called dense feature pyramid network (DFPnet), makes use of recent popular techniques on semantic segmentation to efficiently handle cartoon dog parsing. We achieve a mIoU of 68.39%, a Mean Accuracy of 79.4% and a Pixel Accuracy of 93.5% on our cartoon dog validation set. Our method outperforms state-of-the-art models of similar tasks trained on our dataset: CE2P for single human parsing and Mask R-CNN for instance segmentation. We hope this work can be used as a starting point for future research toward digital artwork understanding with DCNN. Our DFPnet and dataset will be publicly available. Keywords Cartoon character parsing · Semantic part segmentation · Pyramid network · Encoder–decoder · Deep learning for vision
1 Introduction Cartoons are a cultural heritage that constantly gains popularity with the rapid development of the animation industry. However, manually drawing cartoons remains challenging and time-consuming. Artists rely more and more on digital tools to save time, thus creating plentiful of artwork in digital format. Analyzing and understanding this type of data could lead to a plurality of future applications. For instance, an animation can be created by reusing the segmented body parts of a hand-drawn character [1]. Other famous research topics like 2D-to-3D reconstruction need 2D segmentation prerequisite of the target object to generate its 3D representation. Entem et al. [2] and Weng et al. [3] base their 3D reconstitution from the segmentation of the 2D object contour. Another
B
Jerome Wan [email protected] Guillaume Mougeot [email protected] Xubo Yang [email protected]
1
Shanghai Jiaotong University, Shanghai, China
work from Entem et al. [4] and a study from Feng et al. [5] enhance the previous methods by providing segmented parts of the cartoon object. Some research might have studied our challenging issue; nevertheless, few are public in the computer science community because of industry commercial competitions. In this paper, we introduce a novel dataset and a new model to auto