A Prototype System for Selective Dissemination of Broadcast News in European Portuguese

  • PDF / 671,508 Bytes
  • 11 Pages / 600.03 x 792 pts Page_size
  • 81 Downloads / 164 Views

DOWNLOAD

REPORT


Research Article A Prototype System for Selective Dissemination of Broadcast News in European Portuguese R. Amaral,1, 2, 3 H. Meinedo,1, 3 D. Caseiro,1, 3 I. Trancoso,1, 3 and J. Neto1, 3 1 Instituto

Superior T´ecnico, Universidade T´ecnica de Lisboa, 1049-001 Lisboa, Portugal Superior de Tecnologia, Instituto Polit´ecnico de Set´ubal, 2914-503 Set´ubal, Portugal 3 Spoken Language Systems Lab L2F, Institute for Systems and Computer Engineering: Research and Development (INESC-ID), 1000-029 Lisboa, Portugal 2 Escola

Received 8 September 2006; Accepted 14 April 2007 Recommended by Ebroul Izquierdo This paper describes ongoing work on selective dissemination of broadcast news. Our pipeline system includes several modules: audio preprocessing, speech recognition, and topic segmentation and indexation. The main goal of this work is to study the impact of earlier errors in the last modules. The impact of audio preprocessing errors is quite small on the speech recognition module, but quite significant in terms of topic segmentation. On the other hand, the impact of speech recognition errors on the topic segmentation and indexation modules is almost negligible. The diagnostic of the errors in these modules is a very important step for the improvement of the prototype of a media watch system described in this paper. Copyright © 2007 R. Amaral et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1.

INTRODUCTION

The goal of this paper is to give a current overview of a prototype system for selective dissemination of broadcast news (BN) in European Portuguese. The system is capable of continuously monitoring a TV channel, and searching inside its news shows for stories that match the profile of a given user. The system may be tuned to automatically detect the start and end of a broadcast news program. Once the start is detected, the system automatically records, transcribes, indexes, summarizes, and stores the program. The system then searches in all the user profiles for the ones that fit into the detected topics. If any topic matches the user preferences, an email is send to that user, indicating the occurrence and location of one or more stories about the selected topics. This alert message enables a user to follow the links to the video clips referring to the selected stories. Although the development of this system started during the past ALERT European Project, we are continuously trying to improve it, since it integrates several core technologies that are within the most important research areas of our group. The first of these core technologies is audio preprocessing (APP) or speaker diarization which aims at speech/nonspeech classification, speaker segmentation, speaker clustering, and gender, and background conditions

classification. The second one is automatic speech recognition (ASR) that converts the segments classified as speech into text. T