Summary
Speech recognition technology can be used for a wide range of applications. Keyword spotting is one of the more practical implementations of speech recognition, as it does not require any understanding of the transcribed speech, nor does it necessarily de
- PDF / 28,115 Bytes
- 2 Pages / 439.37 x 666.142 pts Page_size
- 16 Downloads / 193 Views
Summary
Speech recognition technology can be used for a wide range of applications. Keyword spotting is one of the more practical implementations of speech recognition, as it does not require any understanding of the transcribed speech, nor does it necessarily demand full transcription accuracy. Naturally, the leading approach in ASR technology, LVCSR, is often translated to other speech processing domains and KWS is no exception. However, because KWS is generally performed on extremely large speech databases, LVCSR is not necessarily the most practical method. Fully transcribing huge amounts of speech is a computationally complex process that demands knowledge sources such as a very large recognition vocabulary and complex language model that bind the outcome transcription to decisions that are difficult to reverse. The level of complexity and lack of flexibility found in LVCSR KWS mechanisms, coupled with the parallel demands of application users for vocabulary-independent and fast KWS, have led researchers to search for alternate solutions. Phonetic search is one method that researchers have turned to. Like the LVCSR method, phonetic search transforms the speech into text prior to beginning the KWS task. This is an advantage, as the transformation is an off-line one-time process, after which KWS spotting can be performed quickly and repeatedly. However, unlike LVCSR KWS, the resulting text is not an attempt at transcribing the speech word for word, but rather is a low-level transcription at the phonemic level. Although, the phonemic transcription may also be laden with errors, these can be overcome by generating phoneme lattices representing multiple hypotheses and by using smart distance calculations that compare the keyword transcriptions with the database transcription. This means that new keywords can be searched for without ever having to rerun the textual transformation stage before a search. Still, however, the phonetic search KWS process is hampered by high complexity when performed on large speech databases; a situation unacceptable for real-world applications. Various methods have been suggested to reduce the computational complexity of the search. Some have aimed for accelerating the search itself, while others at better organizing the searched database through efficient indexing, in order to optimize it for quick retrieval. A. Moyal et al., Phonetic Search Methods for Large Speech Databases, SpringerBriefs in Speech Technology, DOI 10.1007/978-1-4614-6489-1_7, # The Author(s) 2013
45
46
7 Summary
This brief suggests an anchor-based search algorithm that reduces this computational complexity and makes the phonetic search process usable for applications needing rapid searching on large speech DBs. The analysis addresses the phonetic search in its generic form by focusing on the selection of reliable hypotheses. A reduction of almost 90% in search space and computational complexity of phonetic search KWS was achieved by using a phoneme anchor point based search algorithm. Prior to beginning the ex