A Proposal: High-Throughput Robust Architecture for Log Analysis and Data Stream Mining

Various data mining approaches are now available, which help in handling large static data sets, in spite of limited computational resources. However, these approaches lack in mining high-speed endless streams, as their learning procedure though simple re

PDF / 166,172 Bytes
10 Pages / 439.37 x 666.142 pts Page_size
20 Downloads / 195 Views

DOWNLOAD

REPORT

Abstract Various data mining approaches are now available, which help in handling large static data sets, in spite of limited computational resources. However, these approaches lack in mining high-speed endless streams, as their learning procedure though simple require the entire training process to be repeated for each new arriving information instance. The main challenges while dealing with continuous data streams: they are of sizes many times greater than the available memory, are real-time, and the new instances should be inspected at most once, and predictions must be made. Another issue with continuous real-time data is changing of concepts with time, which is often called concept drift. This paper addresses the above stated problems, and provides a solution by proposing a real-time, scalable, and robust architecture. It is a general-purpose architecture, based on online machine learning, which efﬁciently logs and mines the stream data in a fault-tolerant manner. It consists of two frameworks: (1) Event aggregation framework, which reliably collects events and messages from multiple sources and ships them to a destination for processing (2) Real-time computation framework, which processes streams online for extraction of information patterns. It guarantees reliable processing of billions of messages per second. Furthermore, it facilitates the evaluation of the stream learning algorithms and offers change detection strategies to detect concept drifts.

A.R. Hussain (&) Research & Development, Host Analytics Sofwtare Pvt. Ltd., Hyderabad 500 081, AP, India e-mail: adnanrashid.ar@gmail.com M.A. Hameed Department of Computer Science, University College of Engineering, Osmania University, Hyderabad, India e-mail: hameed@gmail.com S. Fatima Department of Computer Science, M.J College of Engineering and Technology, Hyderabad, India e-mail: sana_maseeh@yahoo.com © Springer Science+Business Media Singapore 2016 H.S. Saini et al. (eds.), Innovations in Computer Science and Engineering, Advances in Intelligent Systems and Computing 413, DOI 10.1007/978-981-10-0419-3_36

305

306

A.R. Hussain et al.

Keywords Online Throughput Machine learning analysis Concept drift Real-time Robust

Stream mining

Log

1 Introduction A growing number of emerging business and scientiﬁc apps like satellite radar, stock market, transaction web log, real-time surveillance systems, telecommunication systems, sensor networks [1, 2], and other dynamic environments generate massive amounts of data. This continuously generated real-time, unbounded sequence of data called as a data stream [1–4]. In last decade, much research attention has been given to log processing and mining of data streams. It is demanding to mine streams as it helps in extraction of important knowledge, which is necessary to take crucial decisions in real-time. However, log analysis and extraction of information structures as models and patterns may pose many challenges such as storage, computational, and querying. Due to huge memory requirements and h

Data Loading...

A Proposal: High-Throughput Robust Architecture for Log Analysis and Data Stream Mining

Recommend Documents

Edge Architecture for Dynamic Data Stream Analysis and Manipulation

Stream Data Mining

Robust Data Mining

Empirical Analysis of Classification Algorithms in Data Stream Mining

Stream Data Analysis

Kappa Updated Ensemble for drifting data stream mining

Stream Mining

A Software Architecture Proposal for a Data Platform on Active Mobility and Urban Environment

Stream Data Mining: Algorithms and Their Probabilistic Properties

Multidimensional Analysis of SCADA Stream Data for Estimating the Energy Efficiency of Mining Transport

Geometric Stream Mining

Analysis of Web Log Mining Based on Association Rule