Robust features for text-independent speaker recognition with short utterances

  • PDF / 920,817 Bytes
  • 21 Pages / 595.276 x 790.866 pts Page_size
  • 29 Downloads / 294 Views

DOWNLOAD

REPORT


(0123456789().,-volV) (0123456789().,-volV)

ORIGINAL ARTICLE

Robust features for text-independent speaker recognition with short utterances Rania Chakroun1,3 • Mondher Frikha1,2 Received: 12 July 2019 / Accepted: 17 February 2020  Springer-Verlag London Ltd., part of Springer Nature 2020

Abstract Speaker recognition systems achieve good performance under controlled conditions. However, in real-world conditions, the performance degrades drastically. The principal cause being when limited data are presented. The presence of background noise is another main factor of performance distortion. In spite of the major advances in speaker recognition field, the effect of noise and the limitation of the amount of available speech data are still open problems, and no optimal solution has been found yet to cope with them. In this paper, we propose a new system using new enhanced and reduced gammatone coefficients in order to improve robustness with limited speech data duration. We demonstrate the usefulness of these coefficients compared to the well-known features with speakers taken from different databases recorded under different conditions. Keywords Speaker recognition  Speaker identification  i-vector  PLDA  Short utterances  Noise

1 Introduction Speaker recognition is the ability to recognize an individual only from his voice. This domain has received much attention from the scientific community since many years up to the present day [1–3]. In fact, this technique makes possible the use of the speaker’s voice to verify the identity of the user and control the access to many services such as voice dialing, telephone shopping, banking by telephone, database access services, voice mail, information services, security control in confidential information areas, and remote access to the computers. In this manner, speaker recognition technology is expected to create new services that will make our daily lives more appropriate.

& Rania Chakroun [email protected] Mondher Frikha [email protected] 1

Advanced Technologies for Image and Signal Processing (ATISP) Research Unit, Sfax, Tunisia

2

National School of Electronics and Telecommunications of Sfax, Sfax, Tunisia

3

National School of Engineering of Sfax, Sfax, Tunisia

Speaker recognition is a big area that can be divided into two fundamental applications which are speaker identification and speaker verification. For the identification task, an unknown speaker is compared against a dataset of known speakers, and the best matching speaker is considered as the identification result. For the task of verification, the system purpose is to make a decision whether a voice sample was produced by the claimed person. Both speaker identification and speaker verification applications can be divided into text-dependent and text-independent methods. In text-dependent systems, speaker recognition depends on a specific text being spoken. This method is simpler to the system. For text-independent systems, there are no limitations for the text used in the test or i