Accelerating ELM training over data streams

PDF / 2,310,681 Bytes
16 Pages / 595.276 x 790.866 pts Page_size
30 Downloads / 216 Views

ORIGINAL ARTICLE

Accelerating ELM training over data streams Hangxu Ji1 · Gang Wu1 · Guoren Wang2 Received: 31 January 2019 / Accepted: 12 June 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract In the field of machine learning, offline training and online training occupy the same important position because they coexist in many real applications. The extreme learning machine (ELM) has the characteristics of fast learning speed and high accuracy for offline training, and online sequential ELM (OS-ELM) is a variant of ELM that supports online training. With the explosive growth of data volume, running these algorithms on distributed computing platforms is an unstoppable trend, but there is currently no efficient distributed framework to support both ELM and OS-ELM. Apache Flink is an open-source stream-based distributed platform for both offline processing and online data processing with good scalability, high throughput, and fault-tolerant ability, so it can be used to accelerate both ELM and OS-ELM. In this paper, we first research the characteristics of ELM, OS-ELM and distributed computing platforms, then propose an efficient stream-based distributed framework for both ELM and OS-ELM, named ELM-SDF, which is implemented on Flink. We then evaluate the algorithms in this framework with synthetic data on distributed cluster. In summary, the advantages of the proposed framework are highlighted as follows. (1) The training speed of FLELM is always faster than ELM on Hadoop and Spark, and its scalability behaves better as well. (2) Response time and throughput of FLOS-ELM achieve better performance than OS-ELM on Hadoop and Spark when the incremental training samples arrive. (3) The response time and throughput of FLOS-ELM behave better in native-stream processing mode when the incremental data samples are continuously arriving. Keywords Extreme learning machine · Offline training · Online training · Flink

1 Introduction During the past decade, with the rapid development of the Internet and information technology, big data and artificial intelligence (AI) have become the largest technological trends in computer science. Machine learning is one of the most important fields of AI, and its largest challenges are analyzing and mining large amounts of historical and incremental data. As an emerging machine learning method, the extreme learning machine (ELM), which was proposed for * Gang Wu [email protected] Hangxu Ji [email protected] Guoren Wang [email protected] 1

School of Computer Science and Engineering, Northeastern University, Shenyang, China

School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China

2

training single hidden layer feedforward neural networks (SLFNs) [1–5], perfectly solves this problem because it exhibits strong advantages in terms of training efficiency, accuracy and generalization performance. In this algorithm, the hidden layer nodes are randomly initiated, without iterative steps in the calculation process. In contrast to

Data Loading...

Accelerating ELM training over data streams

Recommend Documents

Aggregate Computation over Data Streams

Accelerating the Training of an LP-SVR Over Large Datasets

Publish/Subscribe over Streams

Ageing-Based Multinomial Naive Bayes Classifiers Over Opinionated Data Streams

Temporal-Logic Query Checking over Finite Data Streams

Effective approximation of parametrized closure systems over transactional data streams

Load Shedding for Window Queries Over Continuous Data Streams

Fixed Parameter Tractability of Graph Deletion Problems over Data Streams

Data Streams

Spatio-Temporal Data Streams

Distributed Data Streams

Transforming Data Streams