Real-time Informatized caption enhancement based on speaker pronunciation time database

PDF / 3,164,499 Bytes
22 Pages / 439.37 x 666.142 pts Page_size
71 Downloads / 191 Views

Real-time Informatized caption enhancement based on speaker pronunciation time database Yong-Sik Choi 1 & Jin-Gu Kang 2 & Jong Wha J. Joo 2 & Jin-Woo Jung 2 Received: 18 February 2019 / Revised: 1 December 2019 / Accepted: 11 August 2020 # The Author(s) 2020

Abstract

IBM Watson is one of the representative tools for speech recognition system which can automatically generate not only speech-to-text information but also speaker ID and timing information, which is called as Informatized Caption. However, if there is some noise in the voice signal to the IBM Watson API, the recognition performance is significantly decreased. It can be easily found in movies with background music and special sound effects. This paper aims to improve the inaccuracy problem of current Informatized Captions in noisy environments. In this paper, a method of modifying incorrectly recognized words and a method of enhancing timing accuracy while updating database in real time are suggested based on the original caption and Informatized Caption information. Experimental results shows that the proposed method can give 81.09% timing accuracy for the case of 10 representative animation, horror and action movies. Keywords Informatized caption . Speaker pronunciation time . IBM Watson API . Speech to text translation Abbreviations X word Original caption of a word X Ts(X) Informatized Caption of a word X Ts+(X) Start time of a word X tS(X) End time of a word X tE(X) S-DB Speaker pronunciation time database p-th speaker Sp k-th word of p-th speaker in S-DB Wpk * Jin-Woo Jung [email protected]

1

Department of Artificial Intelligence, Dongguk University, 30, Pildong-ro 1-gil, Jung-gu, Seoul 04620, Republic of Korea

2

Department of Computer Science and Engineering, Dongguk University, 30, Pildong-ro 1-gil, Jung-gu, Seoul 04620, Republic of Korea

Multimedia Tools and Applications

Up(Wpk) Dp(Wpk) V(X) D(S, X)

Appearance frequency of k-th word of p-th speaker Average pronunciation time of k-th word of p-th speaker The number of characters in a word X Average pronunciation time of a word X of speaker S.

1 Introduction In recent years artificial intelligence has come into wide use in various fields [1, 2, 7, 12, 13, 17, 19]. Artificial intelligence currently encompasses a huge variety of subfields, ranging from the general learning and perception to the specific, such as playing chess, proving mathematical theorems, writing poetry, driving a car on a crowded street, and diagnosing diseases. Artificial intelligence is relevant to any intellectual task, it is truly a universal field. [15] One of the areas that is actively researched is natural language processing by speech recognition. However, machines are difficult to speak, hear and read human language. Therefore, natural language processing and speech recognition are some of the most difficult and important field in artificial intelligence [6]. One of the most popular speech recognition technologies is the IBM Watson API [9]. Among captions in which speech is converted into characters, captions

Data Loading...

Real-time Informatized caption enhancement based on speaker pronunciation time database

Recommend Documents

Time Series Database Querying

Time-Oriented Database

Speaker-Dependent BiLSTM-Based Phrasing

Text-independent speaker recognition using LSTM-RNN and speech enhancement

On Applying Graph Database Time Models for Security Log Analysis

Multi-user search on the encrypted multimedia database: lattice-based searchable encryption scheme with time-controlled

Feature Extraction: A Time Window Analysis Based on the X-ITE Pain Database

Optimal Admission Control Policy Based on Memetic Algorithm in Distributed Real Time Database System

Session Effects on Speaker Modeling

Database-Based Spectrum Access

Evidence Based Medicine Database

Maxdata A Time Series Database System