3D Model Retrieval Using Bipartite Graph Matching Based on Attention
- PDF / 1,958,096 Bytes
- 13 Pages / 439.37 x 666.142 pts Page_size
- 82 Downloads / 270 Views
3D Model Retrieval Using Bipartite Graph Matching Based on Attention Shanlin Sun1 · Yun Li1 · Yunfeng Xie1 · Zhicheng Tan1 · Xing Yao1 · Rongyao Zhang2
© Springer Science+Business Media, LLC, part of Springer Nature 2019
Abstract In this paper, we propose an attention-based bipartite graph 3D model retrieval algorithm, where many-to-many matching method, the weighted bipartite graph matching, is employed for comparison between two 3D models. Considering the panoramic views can donate the spatial and structural information, in this work, we use panoramic views to represent each 3D model. Attention mechanism is used to generate the weight of all views of each model. And then, we construct a weighted bipartite graph with the views of those models and the weight of each view. According to the bipartite graph, the matching result is used to measure the similarity between two 3D models. We experiment our method on ModelNet, NTU and ETH datasets, and the experimental results and comparison with other methods show the effectiveness of our method. Keywords 3D model retrieval · Bipartite graph matching · Attention mechanism
1 Introduction With the rapid development of 3D technologies, computer graphics hardware and networks, 3D objects have been widely used in plenty of applications [4,37], especially in architecture design [6,35], movie production, 3-D graphics, and the medical industry, which leads to the eager requirement of effective and efficient 3D object classification and retrieval. With the prevalence of deep learning, various deep learning networks have been investigated for 3D model recognition and retrieval, such as PointNet [12], 3D ShapeNets [36], VoxNet [23], and Rotation [16] At the same time, view-based methods also have been improved. Su et al. [31] proposed a novel CNN network (MVCNN) to handle the multiple views of 3D model, extracting the
B
Yun Li [email protected]
1
College of Electronic Information and Automation, Guilin University of Aerospace Technology, Guilin 541004, Guangxi, China
2
College of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, Guangxi, China
123
S. Sun et al.
information as 3D model descriptor. The MVCNN focus on fusing the feature map before the full connection while the RotationNet [16] does not. The RotationNet treat the viewpoint labels as latent variables, taking multi-view images of an object as input and predicting its pose and object category. Liu et al. [22] proposed to leverage HCRF to consider the latent visual correlated information among 2D views of 3D model in order to guarantee the robustness of final feature vector. The experiments also demonstrate the performance of this method. Guo et al. [13] proposed a deep embedding network jointly supervised by classification loss and triplet loss to map the high-dimensional image space into a low-dimensional feature space, which can reduce the intra-class variations while increasing the inter-class ones of the input images. The network can guarantee that similar images are
Data Loading...