Training a Multimodal Neural Network to Determine the Authenticity of Images

  • PDF / 750,567 Bytes
  • 8 Pages / 612 x 792 pts (letter) Page_size
  • 77 Downloads / 184 Views

DOWNLOAD

REPORT


FICIAL INTELLIGENCE

Training a Multimodal Neural Network to Determine the Authenticity of Images O. V. Grinchuka,* and V. I. Tsurkovb,** a

b

Moscow Institute of Physics and Technology (MIPT), Dolgoprudny, Moscow oblast, 141701 Russia Federal Research Center Informatics and Control, Russian Academy of Sciences, Moscow, 119333 Russia *e-mail: [email protected] **e-mail: [email protected] Received February 6, 2020; revised February 28, 2020; accepted March 30, 2020

Abstract—The identification of attempts to substitute images plays an important role in protecting biometric systems (authorization in mobile devices, access control systems for premises, terminals with automatic access by face recognition, etc.). This study presents a new method for detecting falsified images based on processing the multimodal data from a camera. A new neural network architecture is developed that aggregates the features from different modalities at all levels of the model. The separation of the training sample for different types of attacks and the initialization of the model with attributes trained in other tasks that are associated with facial images are considered. Numerical experiments on real data are performed, showing the successful performance of the system. The proposed model won first place in the CASIA-SURF competition for the recognition of falsified facial images. DOI: 10.1134/S1064230720040073

INTRODUCTION Together with the rapid development of biometric authentication technologies, there is an urgent need for protection against attempts to bypass the face recognition systems. Before sending biometric samples to the identity verification procedure, the security system must be able to determine who is standing in front of the camera, a living person or a falsified object: a printed photograph, a video recording from the device’s screen, silicone three-dimensional masks, and other hacking methods. Modern face recognition systems can make finer distinctions than humans in this area [1]. A significant part of this success is due to the presence of large marked datasets [2, 3], usually collected from the Internet. In contrast to the face recognition task, datasets for the task of determining authenticity (liveness) require careful manual data collection, since images of various falsifications are not freely available. Such data are collected in laboratories with invited participants and, therefore, are very limited in quantity and variety, which eliminates all the advantages of neural networks when working with ordinary color images. However, to determine liveness, we can use not only ordinary cameras but also special sensors that provide additional modalities for analysis. These modalities add additional information and can improve the detection of liveness. For example, an infrared (IR) camera is insensitive to the screens of electronic devices and automatically protects against possible forgeries of this type. A depth camera allows us to get a three-dimensional image of the object, making it easier to detect flat (di