The hybrid ant colony optimization and ensemble method for solving the data stream e-mail foldering problem

  • PDF / 1,359,885 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 91 Downloads / 145 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

S.I. : 2018 INDIA INTL. CONGRESS ON COMPUTATIONAL INTELLIGENCE

The hybrid ant colony optimization and ensemble method for solving the data stream e-mail foldering problem Jan Kozak1



Przemysław Juszczuk1



Barbara Probierz1

Received: 3 July 2019 / Accepted: 6 December 2019 Ó Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract The e-mail foldering problem is a special classification problem. It concerns a situation where e-mail users create new folders and, at the same time, stop using some of the folders created in the past. Additionally, messages arrive in the system at different time stamps. This article proposes a novel approach to ant colony optimization adapted to data stream analysis. The article is related to the revision of the ant colony optimization algorithm in the e-mail foldering problem and the proposition of a new solution adapted to the data stream. The goal of this work is to allow the classification of messages arriving at the system as data packages; however, due to the large number of decision classes (folders in the inbox), successive packages lead to a large concept drift. To assure the stability of the algorithm, an approach based on the memory being represented as a pheromone trail is introduced. This concept is known from the ant colony optimization methods. At the same time, multiple numbers of classifiers (similar to an ensemble method) are included. The proposed approach was tested on real-world data from the Enron e-mail dataset. An analysis of the two proposed methods related to the data stream was proposed. Both methods were compared with the methods used in the literature. The results achieved, in terms of the accuracy as well as the stability, confirm that (according to a statistical analysis) the proposed solutions are capable of better classifying e-mail messages derived from the system as data packages. Keywords Ant colony optimization  Data stream  Ensemble methods  Decision trees  Enron e-mail  ACDF

1 Introduction Currently, one of the best methods of communication is e-mail. This method is easily accessible, fast, and cheap and allows for long-distance communication among multiple persons without the need to leave home or work. Electronic mail is usually considered a free service, and the receiver of an e-mail message can read the message at any time. It is estimated that a usual user receives, on average, 40–50 e-mail messages daily. Some users even receive & Przemysław Juszczuk [email protected] Jan Kozak [email protected] Barbara Probierz [email protected] 1

University of Economics in Katowice, 1 Maja 50, 40-287 Katowice, Poland

hundreds of messages daily; thus, most of their work time is dedicated to reading and answering these messages. Unfortunately, a large fraction of received messages includes unimportant information, which should be filtered. For this reason, research dedicated to employing mechanisms allowing for automatic control of e-mail applications