Real-time Informatized caption enhancement based on speaker pronunciation time database
- PDF / 3,164,499 Bytes
- 22 Pages / 439.37 x 666.142 pts Page_size
- 71 Downloads / 167 Views
Real-time Informatized caption enhancement based on speaker pronunciation time database Yong-Sik Choi 1 & Jin-Gu Kang 2 & Jong Wha J. Joo 2 & Jin-Woo Jung 2 Received: 18 February 2019 / Revised: 1 December 2019 / Accepted: 11 August 2020 # The Author(s) 2020
Abstract
IBM Watson is one of the representative tools for speech recognition system which can automatically generate not only speech-to-text information but also speaker ID and timing information, which is called as Informatized Caption. However, if there is some noise in the voice signal to the IBM Watson API, the recognition performance is significantly decreased. It can be easily found in movies with background music and special sound effects. This paper aims to improve the inaccuracy problem of current Informatized Captions in noisy environments. In this paper, a method of modifying incorrectly recognized words and a method of enhancing timing accuracy while updating database in real time are suggested based on the original caption and Informatized Caption information. Experimental results shows that the proposed method can give 81.09% timing accuracy for the case of 10 representative animation, horror and action movies. Keywords Informatized caption . Speaker pronunciation time . IBM Watson API . Speech to text translation Abbreviations X word Original caption of a word X Ts(X) Informatized Caption of a word X Ts+(X) Start time of a word X tS(X) End time of a word X tE(X) S-DB Speaker pronunciation time database p-th speaker Sp k-th word of p-th speaker in S-DB Wpk * Jin-Woo Jung [email protected]
1
Department of Artificial Intelligence, Dongguk University, 30, Pildong-ro 1-gil, Jung-gu, Seoul 04620, Republic of Korea
2
Department of Computer Science and Engineering, Dongguk University, 30, Pildong-ro 1-gil, Jung-gu, Seoul 04620, Republic of Korea
Multimedia Tools and Applications
Up(Wpk) Dp(Wpk) V(X) D(S, X)
Appearance frequency of k-th word of p-th speaker Average pronunciation time of k-th word of p-th speaker The number of characters in a word X Average pronunciation time of a word X of speaker S.
1 Introduction In recent years artificial intelligence has come into wide use in various fields [1, 2, 7, 12, 13, 17, 19]. Artificial intelligence currently encompasses a huge variety of subfields, ranging from the general learning and perception to the specific, such as playing chess, proving mathematical theorems, writing poetry, driving a car on a crowded street, and diagnosing diseases. Artificial intelligence is relevant to any intellectual task, it is truly a universal field. [15] One of the areas that is actively researched is natural language processing by speech recognition. However, machines are difficult to speak, hear and read human language. Therefore, natural language processing and speech recognition are some of the most difficult and important field in artificial intelligence [6]. One of the most popular speech recognition technologies is the IBM Watson API [9]. Among captions in which speech is converted into characters, captions
Data Loading...