Sparse Representations for Speech Recognition

This chapter presents the methods that are currently exploited for sparse optimization in speech. It also demonstrates how sparse representations can be constructed for classification and recognition tasks, and gives an overview of recent results that wer

PDF / 785,077 Bytes
48 Pages / 439.37 x 666.142 pts Page_size
73 Downloads / 229 Views

DOWNLOAD

REPORT

Sparse Representations for Speech Recognition Tara N. Sainath, Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran and Stephen Wright

Abstract This chapter presents the methods that are currently exploited for sparse optimization in speech. It also demonstrates how sparse representations can be constructed for classification and recognition tasks, and gives an overview of recent results that were obtained with sparse representations.

15.1 Introduction Sparse representation techniques for machine learning applications have become increasing popular in recent years [1, 2]. Since it is not obvious how to represent speech as a sparse signal, sparse representations have received attention only recently from the speech community [3], where they were proposed originally as a way to enforce exemplar-based representations. Exemplar-based approaches have also found a place in modern speech recognition [4] as an alternative way of modeling observed data. Recent advances in computing power and improvements in machine learning algorithms have made such techniques successful on increasingly complex speech tasks. The goal of exemplar-based modeling is to establish a generalization T. N. Sainath (B) · D. Kanevsky · D. Nahamoo · B. Ramabhadran IBM T. J. Watson Research Center, Yorktown Heights, NY, USA e-mail: [email protected] D. Kanevsky e-mail: [email protected] D. Nahamoo e-mail: [email protected] B. Ramabhadran e-mail: [email protected] S. Wright University of Wisconsin, Madison, WI, USA e-mail: [email protected] A. Y. Carmi et al. (eds.), Compressed Sensing & Sparse Filtering, Signals and Communication Technology, DOI: 10.1007/978-3-642-38398-4_15, © Springer-Verlag Berlin Heidelberg 2014

455

456

T. N. Sainath et al.

from the set of observed data such that accurate inference (classification, decision, recognition) can be made about the data yet to be observed the “ unseen” data. This approach selects a subset of exemplars from the training data to build a local model for every test sample, in contrast with the standard approach, which uses all available training data to build a model before the test sample is seen. Exemplar-based methods, including k-nearest neighbors (kNN) [1], support vector machines (SVMs) and sparse representations (SRs) [3], utilize the details of actual training examples when making a classification decision. Since the number of training examples in speech tasks can be very large, such methods commonly use a small number of training examples to characterize a test vector, that is, a sparse representation. This approach stands in contrast to such standard regression methods as ridge regression [5], nearest subspace [6], and nearest line [6] techniques, which utilize information about all training examples when characterizing a test vector. An SR classifier can be defined as follows. A dictionary H = [h 1 ; h 2 . . . ; h N ] is constructed using individual examples of training data, where each h i ∈ Rem is a feature vector belonging to a specific class. H is an over-complete dictionary, in

Data Loading...

Sparse Representations for Speech Recognition

Recommend Documents

Pattern Recognition for Speech Detection

Speech Recognition

Texture Classification Using Sparse Frame-Based Representations

Deep Learning for NLP and Speech Recognition

Multi-features Integration for Speech Emotion Recognition

Remote Targets Recognition Based on Adaptive Weighting Feature Dictionaries and Joint Sparse Representations

Discriminative Localized Sparse Representations for Breast Cancer Screening

Advanced Comb Filtering for Robust Speech Recognition

Novel Techniques for Dialectal Arabic Speech Recognition

Exploring Blockchain in Speech Recognition

Automatic speech recognition: a survey

Medical reporting using speech recognition