Supervised Web Document Classification Using Discrete Transforms, Active Hypercontours and Expert Knowledge

In this paper, a new method of supervised classification of documents is proposed. It utilizes discrete trasforms to extract features from classified objects and adopts adaptive potential active hypercontours (APAH) for document classification. The idea o

PDF / 532,368 Bytes
19 Pages / 430 x 660 pts Page_size
2 Downloads / 176 Views

DOWNLOAD

REPORT

Institute of Computer Science, Technical University of Lodz Wolczanska 215, 93-005 Lodz, Poland [email protected], [email protected] 2 Systems Research Institute, Polish Academy of Sciences Newelska 6, 01-447 Warsaw, Poland

Abstract. In this paper, a new method of supervised classiﬁcation of documents is proposed. It utilizes discrete trasforms to extract features from classiﬁed objects and adopts adaptive potential active hypercontours (APAH) for document classiﬁcation. The idea of APAH generalizes classic contour methods of image segmentation. It has two main advantages: it can use almost any knowledge during the search for an optimal classiﬁcation function and it can operate in a feature space where only metric is deﬁned. Here, both of them are utilized - the ﬁrst one by using expert knowledge about signiﬁcance of documents from training set and the second one by inducing new metrics in feature spaces. The method has been evaluated on the subset of open directory project (ODP) database and compared with k-NN, the well known classiﬁcation technique.

1

Introduction

The rapid development of Web Intelligence (WI) [1,2,3,4,5,6,7,8,9] technologies leads to the growth of the amount of reliable knowledge that can be used for the eﬃciency improvement of many standard tasks in artiﬁcial intelligence, which in turn WI can beneﬁt from. This imposes the necessity to either create new methods that are able to eﬀectively adapt knowledge coming from diﬀerent sources or modify the existing techniques in order to satisfy Web Intelligence requirements. The presented approach joins experiences gained from the domains that have been considered separately so far, giving mechanisms capable of utilizing external knowledge in an eﬃcient and ﬂexible way. The paper is organized as follows: in section 2 the problem of classiﬁcation of documents is stated, in section 3 integral spatial transformations using kernel methods for feature extraction are described and in section 4 the adaptive potential active hypercontour algorithm used for construction of an optimal classiﬁer is presented. The next two sections focus on the presentation of data used in the experiments and the discussion of obtained results respectively. The paper concludes with the summary of the proposed method. N. Zhong et al. (Eds.): WImBI 2006, LNAI 4845, pp. 305–323, 2007. c Springer-Verlag Berlin Heidelberg 2007

306

2 2.1

P.S. Szczepaniak, A. Tomczyk, and M. Pryczek

Supervised and Unsupervised Document Classiﬁcation Classiﬁcation

The classiﬁcation problem can be formulated as the task of assigning a proper label l from the ﬁnite set of labels L (where e.g. L = {1, . . . , L} and L is a number of classes) to each object o from the given set of objects O. Such an assignment can formally be described as a classiﬁcation function (classiﬁer ) k : O → L (each object o ∈ O receives a unique label l ∈ L). Because there are many functions k ∈ K that map O into L (where K denotes a set of all possible classiﬁers in a given problem) the problem of construction

Data Loading...

Supervised Web Document Classification Using Discrete Transforms, Active Hypercontours and Expert Knowledge

Recommend Documents

Active Document

Does Multi-user Document Classification Really Help Knowledge Management?

Classification of Chest Diseases Using Wavelet Transforms and Transfer Learning

Using expert knowledge in landscape ecology

Sentiment Classification Using Supervised Sub-Spacing

Semi-Supervised Classification

Passion, Expert Knowledge, and Community Entrepreneurship

Supervised Learning for Classification Problems

Multilabel Toxic Comment Classification Using Supervised Machine Learning Algorithms

Weakly Supervised Short Text Categorization Using World Knowledge

Asymmetric Cryptosystem Using Structured Phase Masks in Discrete Cosine and Fractional Fourier Transforms

Correction to: Clinical use of physiological lesion assessment using pressure guidewires: an expert consensus document o