Pitch Correlogram Clustering for Fast Speaker Identification

PDF / 667,559 Bytes
10 Pages / 600 x 792 pts Page_size
16 Downloads / 228 Views

Pitch Correlogram Clustering for Fast Speaker Identification Nitin Jhanwar Research and Development Division, Danlaw Technologies India Limited, Hyderabad 500 034, India Email: [email protected]

Ajay K. Raina Research and Development Division, Danlaw Technologies India Limited, Hyderabad 500 034, India Email: [email protected] Department of Electrical and Electronic Engineering, The University of Melbourne, Victoria 3010, Australia Received 24 June 2003; Revised 25 June 2004; Recommended for Publication by Bastiaan Kleijn Gaussian mixture models (GMMs) are commonly used in text-independent speaker identification systems. However, for large speaker databases, their high computational run-time limits their use in online or real-time speaker identification situations. Two-stage identification systems, in which the database is partitioned into clusters based on some proximity criteria and only a single-cluster GMM is run in every test, have been suggested in literature to speed up the identification process. However, most clustering algorithms used have shown limited success, apparently because the clustering and GMM feature spaces used are derived from similar speech characteristics. This paper presents a new clustering approach based on the concept of a pitch correlogram that captures frame-to-frame pitch variations of a speaker rather than short-time spectral characteristics like cepstral coeﬃcient, spectral slopes, and so forth. The eﬀectiveness of this two-stage identification process is demonstrated on the IVIE corpus of 110 speakers. The overall system achieves a run-time advantage of 500% as well as a 10% reduction of error in overall speaker identification. Keywords and phrases: speaker identification, clustering, pitch, correlogram.

1.

INTRODUCTION

Speaker recognition aims at extracting and modeling characteristics of speech data that uniquely represent a person. These characteristics should ideally be robust to channel eﬀects and noisy environment [1]. Cepstral coeﬃcients [2], in Mel frequency domain, are the most robust, in this sense, among all feature vectors currently employed in speech recognition systems in a Gaussian mixture model (GMM) framework [3, 4]. At present, these features are also commonly employed for speaker identification, even though the best feature vector for speaker identification, as contrasted with speech recognition, is still an open problem. Recent papers suggest transformed feature vectors for performance enhancement [5] in speaker identification systems. Campbell’s paper is still a very good reference to the problem and issues involved in speaker identification [6]. Speaker recognition is done at two levels: verification [7, 8, 9] and identification [1, 10]. Verification systems are closed set operations in which a speaker’s claim to be one of the enrolled speakers is verified, generally in a cooperative text prompted mode, as in voice-based access control systems.

The system conducts a binary hypothesis test, relative to the claimed identity, on the speech feature data

Data Loading...

Pitch Correlogram Clustering for Fast Speaker Identification

Recommend Documents

Speaker Clustering

Information Security for Automatic Speaker Identification

Speaker Identification and Verification, SIV

Directional Clustering with Polyharmonic Phase Estimation for Enhanced Speaker Localization

Pitch and Pitch Strength

Pitch

The Speaker Identification Ability of Blind and Sighted Listeners An

An investigation towards speaker identification using a single-sound-frame

Pitch Transition and Pitch Stability

Speaker Identification Using Entropygrams and Convolutional Neural Networks

A Multi-Stage Approach for Fast Person Re-identification

A Fuzzy Clustering Approach for TS Fuzzy Model Identification