Structuring Broadcast Audio for Information Access
- PDF / 1,722,728 Bytes
- 11 Pages / 595.2 x 792 pts Page_size
- 15 Downloads / 243 Views
Structuring Broadcast Audio for Information Access Jean-Luc Gauvain Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France Email: [email protected]
Lori Lamel Spoken Language Processing Group, LIMSI-CNRS, BP 133, 91403 Orsay Cedex, France Email: [email protected] Received 10 May 2002 and in revised form 3 November 2002 One rapidly expanding application area for state-of-the-art speech recognition technology is the automatic processing of broadcast audiovisual data for information access. Since much of the linguistic information is found in the audio channel, speech recognition is a key enabling technology which, when combined with information retrieval techniques, can be used for searching large audiovisual document collections. Audio indexing must take into account the specificities of audio data such as needing to deal with the continuous data stream and an imperfect word transcription. Other important considerations are dealing with language specificities and facilitating language portability. At Laboratoire d’Informatique pour la M´ecanique et les Sciences de l’Ing´enieur (LIMSI), broadcast news transcription systems have been developed for seven languages: English, French, German, Mandarin, Portuguese, Spanish, and Arabic. The transcription systems have been integrated into prototype demonstrators for several application areas such as audio data mining, structuring audiovisual archives, selective dissemination of information, and topic tracking for media monitoring. As examples, this paper addresses the spoken document retrieval and topic tracking tasks. Keywords and phrases: audio indexing, structuring audio data, multilingual speech recognition, audio partitioning, spoken document retrieval, topic tracking.
1.
INTRODUCTION
The amount of information accessible electronically is growing at a very fast rate. For what concerns the speech and audio processing domain, the information sources of primary interest are radio, television, and telephonic, a variety of which are available on the Internet. Extrapolating from Lesk (1997) [1], we can estimate that there are about 50,000 radio stations and 10,000 television stations worldwide. If each station transmits a few hours of unique broadcasts per day, there are well over 100,000 hours of potentially interesting data produced annually (excluding mainly music, films, and TV series). Although not the subject of this paper, evidently the largest amount of audio data produced consists of telephone conversations (estimated at over 30,000 petabytes annually). In contrast, the amount of textual data can be estimated as a few terabytes annually including newspapers and web texts. Despite the quantity and the rapid growth rate, it is possible to store all this data, should there be a reason to do so. What lacks is an efficient manner to access the content of the audio and audiovisual data. As an example, the French National Institute of Audiovisual archives (INA) has over 1.5 million hours of audiovisual data. The vast majority of this data has only very
Data Loading...