New Options for Hoeffding Trees

Hoeffding trees are state-of-the-art for processing high-speed data streams. Their ingenuity stems from updating sufficient statistics, only addressing growth when decisions can be made that are guaranteed to be almost identical to those that would be mad

PDF / 490,238 Bytes
10 Pages / 430 x 660 pts Page_size
45 Downloads / 231 Views

DOWNLOAD

REPORT

stract. Hoeﬀding trees are state-of-the-art for processing high-speed data streams. Their ingenuity stems from updating suﬃcient statistics, only addressing growth when decisions can be made that are guaranteed to be almost identical to those that would be made by conventional batch learning methods. Despite this guarantee, decisions are still subject to limited lookahead and stability issues. In this paper we explore Hoeﬀding Option Trees, a regular Hoeﬀding tree containing additional option nodes that allow several tests to be applied, leading to multiple Hoeﬀding trees as separate paths. We show how to control tree growth in order to generate a mixture of paths, and empirically determine a reasonable number of paths. We then empirically evaluate a spectrum of Hoeﬀding tree variations: single trees, option trees and bagged trees. Finally, we investigate pruning. We show that on some datasets a pruned option tree can be smaller and more accurate than a single tree.

1

Introduction

When training a model on a data stream it is important to be conservative with memory and to make a single scan of the data as quickly as possible. Hoeﬀding trees [6] achieve this by accumulating suﬃcient statistics from examples in a node to the point where they can be used to make a sensible split decision. A decision, in fact, that is guaranteed to be almost as reliable as the one that would be made by an algorithm that could see all of the data. The suﬃcient statistics turn out to be beneﬁcial for both tree growth and prediction as they can be used to build Naive Bayes models at the leaves of the tree that are more accurate than majority class estimates [7]. There remain situations where Naive Bayes leaves are not always superior to majority class, and [8] shows that adaptively deciding when their use is beneﬁcial per leaf of the tree can further enhance classiﬁcation performance. Given this sequence of improvements to the basic algorithm, is it possible to improve predictive performance further? Contemporary machine learning would suggest that the next logical step is to employ ensembles of Hoeﬀding trees using a framework such as bagging or boosting. In part, ensemble methods seek to overcome problems inherent in all greedy tree learners. Tree learners have limited lookahead ability (the best scoring single M.A. Orgun and J. Thornton (Eds.): AI 2007, LNAI 4830, pp. 90–99, 2007. c Springer-Verlag Berlin Heidelberg 2007

New Options for Hoeﬀding Trees

91

test based on the children that test would generate) and they are not stable. Stability [2] relates to the selection of a particular attribute split when the scores of other attribute splits are close. The examples processed just prior to the split decision could have an undue inﬂuence over this decision and it may not be the best decision in the longer term. Breiman [4] demonstrates just how unstable this inﬂuence can be, especially when marginal decisions are taken near the root of the tree. Even though decisions are made with more data in a Hoeﬀding tree, they are still prone

Data Loading...

New Options for Hoeffding Trees

Recommend Documents

New Options for Investigating Macro-level Variation in Segregation

New Screen-printed Metal Paste Options for PV Manufacturing

Pharmacological Options for NASH

Options

Basic Models for Options Pricing

Trees

Binomial Model for European Options

Packaging Options for MEMS Devices

Ceramic Immobilization Options for Technetium

Current Options and New Developments in the Treatment of Haemophilia

Design Patterns for Options Processing

Growth Options for Regional Utilities