Hierarchical Sparse Dictionary Learning

Sparse coding plays a key role in high dimensional data analysis. One critical challenge of sparse coding is to design a dictionary that is both adaptive to the training data and generalizable to unseen data of same type. In this paper, we propose a novel

  • PDF / 435,629 Bytes
  • 14 Pages / 439.37 x 666.142 pts Page_size
  • 31 Downloads / 266 Views

DOWNLOAD

REPORT


Electrical and Computer Engineering Department, North Carolina State University, Raleigh, NC 27695, USA [email protected] 2 Department of Computer and Information Science, IUPUI, Indianapolis, IN 46202, USA [email protected] 3 Autonomic Management Department, NEC Labs America, Princeton, NJ 45237, USA [email protected]

Abstract. Sparse coding plays a key role in high dimensional data analysis. One critical challenge of sparse coding is to design a dictionary that is both adaptive to the training data and generalizable to unseen data of same type. In this paper, we propose a novel dictionary learning method to build an adaptive dictionary regularized by an a-priori over-completed dictionary. This leads to a sparse structure of the learned dictionary over the a-priori dictionary, and a sparse structure of the data over the learned dictionary. We apply the hierarchical sparse dictionary learning approach on both synthetic data and real-world high-dimensional time series data. The experimental results demonstrate that the hierarchical sparse dictionary learning approach reduces overfitting and enhances the generalizability of the learned dictionary. Moreover, the learned dictionary is optimized to adapt to the given data and result in a more compact dictionary and a more robust sparse representation. The experimental results on real datasets demonstrate that the proposed approach can successfully characterize the heterogeneity of the given data, and leads to a better and more robust dictionary.

1

Introduction

Sparse representation has been demonstrated as very powerful in analyzing high dimensional data [1–3], where each data point can be typically represented as a linear combination of a few atoms in an over-complete dictionary. Assume x ∈ Rd is a data vector and D is the dictionary, and then the sparse representation of x can be formulated as to find the sparse code w over D by solving the following optimization problem, min w

s.t.

w0 Dw − x ≤ σ,

where σ is a pre-defined threshold. The pursued sparse code w can been considered as a robust representation of x, and can be used for clustering [4,5], classification [6] and denoising [2,7]. c Springer International Publishing Switzerland 2015  A. Appice et al. (Eds.): ECML PKDD 2015, Part II, LNAI 9285, pp. 687–700, 2015. DOI: 10.1007/978-3-319-23525-7 42

688

X. Bian et al.

One key question is how to construct such an over-complete dictionary that is suitable for sparse representation. There are two major approaches for constructing such dictionaries: analytic approaches and learning-based approaches [8]. In an analytic approach, the dictionary is carefully designed a priori, e.g. with atoms such as wavelets [9], curvelets [10] and shearlets [11,12]. One advantage of the analytic approaches is that the dictionary can be designed as well-conditioned for stable representation, for example, to have a better incoherence condition or restricted isometry property [13,14]. In learning-based approaches, the dictionaries are learned from the given data [2,15,16]. Compared to