Efficient Algorithm and Architecture of Critical-Band Transform for Low-Power Speech Applications

  • PDF / 997,472 Bytes
  • 10 Pages / 600.03 x 792 pts Page_size
  • 108 Downloads / 208 Views

DOWNLOAD

REPORT


Research Article Efficient Algorithm and Architecture of Critical-Band Transform for Low-Power Speech Applications Chao Wang1, 2 and Woon-Seng Gan2 1 Center

for Signal Processing, School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798 2 Digital Signal Processing Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798 Received 15 December 2005; Revised 8 December 2006; Accepted 18 January 2007 Recommended by Hugo Van Hamme An efficient algorithm and its corresponding VLSI architecture for the critical-band transform (CBT) are developed to approximate the critical-band filtering of the human ear. The CBT consists of a constant-bandwidth transform in the lower frequency range and a Brown constant-Q transform (CQT) in the higher frequency range. The corresponding VLSI architecture is proposed to achieve significant power efficiency by reducing the computational complexity, using pipeline and parallel processing, and applying the supply voltage scaling technique. A 21-band Bark scale CBT processor with a sampling rate of 16 kHz is designed and simulated. Simulation results verify its suitability for performing short-time spectral analysis on speech. It has a better fitting on the human ear critical-band analysis, significantly fewer computations, and therefore is more energy-efficient than other methods. With a 0.35 μm CMOS technology, it calculates a 160-point speech in 4.99 milliseconds at 234 kHz. The power dissipation is 15.6 μW at 1.1 V. It achieves 82.1% power reduction as compared to a benchmark 256-point FFT processor. Copyright © 2007 C. Wang and W.-S. Gan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

Spectral analysis is one of the most fundamental operations in the field of acoustic and speech signal processing. It transforms the time-domain acoustic signal into a frequencydomain spectrum. Some traditional methods, such as fast Fourier transform (FFT), short-time Fourier transform, and filterbank (a group of bandpass filters), have been widely used in academia and industry. These methods usually have a constant frequency resolution. However, psychoacoustical studies show that the human ear performs spectral analysis on the acoustic signal in the form of a filterbank with nonuniform critical bandwidths [1]. For wide-band speech with a bandwidth of 8 kHz, there are 21 critical bands for the Bark scale described by Zwicker [2] and 24 bands for the Mel scale [3]. An interesting finding is that, the bandwidths of the critical bands with center frequencies below a certain frequency are approximately constant. The bandwidths are around 100 Hz below 500 Hz in the Bark scale and below 1 kHz in the Mel scale. Above 500 Hz in the Bark scale or 1 kHz in the Mel scale, the bandwidths increase as the center

frequencie