Sparse Representations for Speech Recognition
This chapter presents the methods that are currently exploited for sparse optimization in speech. It also demonstrates how sparse representations can be constructed for classification and recognition tasks, and gives an overview of recent results that wer
- PDF / 785,077 Bytes
- 48 Pages / 439.37 x 666.142 pts Page_size
- 73 Downloads / 221 Views
Sparse Representations for Speech Recognition Tara N. Sainath, Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran and Stephen Wright
Abstract This chapter presents the methods that are currently exploited for sparse optimization in speech. It also demonstrates how sparse representations can be constructed for classification and recognition tasks, and gives an overview of recent results that were obtained with sparse representations.
15.1 Introduction Sparse representation techniques for machine learning applications have become increasing popular in recent years [1, 2]. Since it is not obvious how to represent speech as a sparse signal, sparse representations have received attention only recently from the speech community [3], where they were proposed originally as a way to enforce exemplar-based representations. Exemplar-based approaches have also found a place in modern speech recognition [4] as an alternative way of modeling observed data. Recent advances in computing power and improvements in machine learning algorithms have made such techniques successful on increasingly complex speech tasks. The goal of exemplar-based modeling is to establish a generalization T. N. Sainath (B) · D. Kanevsky · D. Nahamoo · B. Ramabhadran IBM T. J. Watson Research Center, Yorktown Heights, NY, USA e-mail: [email protected] D. Kanevsky e-mail: [email protected] D. Nahamoo e-mail: [email protected] B. Ramabhadran e-mail: [email protected] S. Wright University of Wisconsin, Madison, WI, USA e-mail: [email protected] A. Y. Carmi et al. (eds.), Compressed Sensing & Sparse Filtering, Signals and Communication Technology, DOI: 10.1007/978-3-642-38398-4_15, © Springer-Verlag Berlin Heidelberg 2014
455
456
T. N. Sainath et al.
from the set of observed data such that accurate inference (classification, decision, recognition) can be made about the data yet to be observed the “ unseen” data. This approach selects a subset of exemplars from the training data to build a local model for every test sample, in contrast with the standard approach, which uses all available training data to build a model before the test sample is seen. Exemplar-based methods, including k-nearest neighbors (kNN) [1], support vector machines (SVMs) and sparse representations (SRs) [3], utilize the details of actual training examples when making a classification decision. Since the number of training examples in speech tasks can be very large, such methods commonly use a small number of training examples to characterize a test vector, that is, a sparse representation. This approach stands in contrast to such standard regression methods as ridge regression [5], nearest subspace [6], and nearest line [6] techniques, which utilize information about all training examples when characterizing a test vector. An SR classifier can be defined as follows. A dictionary H = [h 1 ; h 2 . . . ; h N ] is constructed using individual examples of training data, where each h i ∈ Rem is a feature vector belonging to a specific class. H is an over-complete dictionary, in
Data Loading...