Two-stage adaptive integration of multi-source heterogeneous data based on an improved random subspace and prediction of

PDF / 917,230 Bytes
11 Pages / 595.276 x 790.866 pts Page_size
63 Downloads / 232 Views

(0123456789().,-volV)(0123456789().,-volV)

S.I. : SPIOT 2020

Two-stage adaptive integration of multi-source heterogeneous data based on an improved random subspace and prediction of default risk of microcredit Anzhong Huang1 • Fei Wu2 Received: 16 August 2020 / Accepted: 27 October 2020 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Some scholars have shown that the machine learning methods based on a single-source data can successfully monitor the risks of formal financial activities, but not those of informal financial activities. This is because the data generated by formal financial activities, whether it is the structured or unstructured data, are of high quality and quantity, while the data generated by informal financial activities are not. Therefore, multi-source data are the key to monitor the risks of informal financial activities through machine learning. Although a few studies attempted to use multi-source data for financial risk prediction, they simply stack the obtained multi-source data, but ignore the original sources, heterogeneity, mutual redundancy and other characteristics of the data, so that the improvement of the prediction effect is not obvious. Therefore, TSAIB_RS method based on the two-stage adaptive integration of multi-source heterogeneous data was constructed in the paper, in which the data with different sources and different distributions were adaptively integrated. In order to test the reliability of TSAIB_RS method, the paper takes the default risk of microcredit in China as the test target and compares the prediction results of various test methods. It concludes that TSAIB_RS method can significantly improve the prediction effects. Keywords Multi-source heterogeneous data Adaptive integration Microcredit risk

1 Introduction Information asymmetry is the root cause of financial risks, and obtaining as much information as possible is the key to predict financial risks. As a result, some scholars put forward the problem of multi-source information in financial risk monitoring earlier [1, 2], which means that banks should not only use hard information (financial statement information), but also use soft information (financial statement information) to reduce credit risk. However, soft information is often unstructured data, which cannot be used by statistics and econometrics, which

& Fei Wu [email protected] 1

School of Economics and Management, Jiangsu University of Science and Technology, Zhenjiang 212003, China

2

School of Law, Shanghai University of Finance and Economics, Shanghai 200433, China

are the traditional financial risk prediction methods. It greatly limits the improvement of financial risk prediction accuracy, because a large amount of information in the Internet era is unstructured data. Therefore, machine learning is an excellent supplement to the traditional methods of financial risk prediction. As for the relationship between the data used in machine learning and the prediction effect, Tsai found that the algorithm of risk

Data Loading...

Two-stage adaptive integration of multi-source heterogeneous data based on an improved random subspace and prediction of

Recommend Documents

An improved subspace weighting method using random matrix theory

An Integration Framework for Liver Cancer Subtype Classification and Survival Prediction Based on Multi-omics Data

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

On Random-Forest-Based Prediction Intervals

Prediction of Target Genes Based on Multiway Integration of High-Throughput Data

Integration on Heterogeneous Data with Uncertainty in Emergency System

An Improved Adaptive Genetic Algorithm

Improved Random Forest Algorithm Based on Adaptive Step Size Artificial Bee Colony Optimization

Automating Data Integration in Adaptive and Data-Intensive Information Systems

Characterizing the Diffusion of Knowledge in an Academic Community Through the Integration of Heterogeneous Data Sources

An improved density-based adaptive p -spectral clustering algorithm

An Improved Heterogeneous Dynamic List Schedule Algorithm