Detection of interactive voice response (IVR) in phone call records

PDF / 1,494,529 Bytes
9 Pages / 595.276 x 790.866 pts Page_size
116 Downloads / 206 Views

Detection of interactive voice response (IVR) in phone call records Andrei Kopylov1 · Oleg Seredin1 · Andrei Filin1,2 · Boris Tyshkevich2 Received: 8 January 2020 / Accepted: 11 September 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract Separation of pre-recorded messages (Interactive Voice Response, IVR) from live speech fragments in real-time plays a significant role in speech emotion recognition (SER) systems, unwanted calls filtering, automatic detection of answering machine responses, reduction of stored record sizes, voice mail spam filtration, etc. The problem complexity is that, unlike with silent, music, and noise fragments studied by the conventional voice activity recognition (VAD), IVR usually contains speech. Three classifiers for live speech fragments detection in phone call records are considered: based on the support vector machine (SVM), gradient boosting (XGBoost) and convolutional neural network (CNN). The Geneva Minimalistic Acoustic Parameter Set for XGBoost and SVM, and log-spectrograms and gammatonegrams for CNN were used for feature representation of audio fragments. Experiments with a dataset of phone calls demonstrate comparable quality (around 0.96 according to the F1-averaged measure) of the considered algorithms with CNN having a advantage (0.98). Keywords IVR · SVM · Gradient boosting · CNN · Speech analysis · GeMAPS · Log-spectrogram · Gammatonegram

1 Introduction The development of computer telephony drives the growing popularity of virtual call centers. Often they are considered as a required component of IT infrastructure and today’s economy. Such centers generate huge amounts of audio data, so intelligent algorithms should be applied to analyze it. In particular, automated speech emotion recognition (SER) in dialogues enables enhancing the commonly used call center key performance indicators (KPI) and introducing new KPIs based on round-the-clock monitoring. A required step in a SER system is the identification of spontaneous speech audio fragments, referred to as “live speech” hereinafter, to be analyzed, and recorded speech fragments also containing noise, music, or silence. Most of * Andrei Filin [email protected] Andrei Kopylov [email protected] Oleg Seredin [email protected] Boris Tyshkevich [email protected] 1

Tula State University, Tula, Russia

ITooLabs, Tula, Russia

2

today’s call centers use pre-recorded messages (IVR) for automated interaction with clients (e.g. routing calls, interactive queues, so-called “cold calling”, etc.). IVR detection is also required for unwanted calls filtering, automatic detection of answering machine responses, reduction of stored record sizes, voice mail spam filtration, etc., and it goes beyond the traditional detection of speech fragments (VAD) in audio streams. In most cases, a person can determine whether the sentence is pre-recorded or live. That is why it should be possible to solve the identification problem with advanced machine learning methods. As far as we know, IVR detect

Data Loading...

Detection of interactive voice response (IVR) in phone call records

Recommend Documents

Interactive Voice Response (IVR)

Inferring Unusual Crowd Events from Mobile Phone Call Detail Records

A Phone Call

Enhancing Control of the Medication Supply Chain in Clinical Trials Managed by Interactive Voice Response Systems

Towards robust voice pathology detection

Lightweight CNN for Robust Voice Activity Detection

Correction to: Towards robust voice pathology detection

The effect of a simple phone call intervention on FIT-positive individuals: an exploratory study

Country-Scale Exploratory Analysis of Call Detail Records Through the Lens of Data Grid Models

Exit, Voice, Loyalty: Using an Exit Phone Interview to Mitigate the Silent Departure Phenomenon

Proposition of Innovative and Scalable Information System for Call Detail Records Analysis and Visualisation

The Role of Voice Evaluation in Voice Recall