Term-Weighting for Summarization of Multi-party Spoken Dialogues

This paper explores the issue of term-weighting in the genre of spontaneous, multi-party spoken dialogues, with the intent of using such term-weights in the creation of extractive meeting summaries. The field of text information retrieval has yielded many

PDF / 233,011 Bytes
12 Pages / 430 x 660 pts Page_size
49 Downloads / 188 Views

DOWNLOAD

REPORT

Abstract. This paper explores the issue of term-weighting in the genre of spontaneous, multi-party spoken dialogues, with the intent of using such term-weights in the creation of extractive meeting summaries. The field of text information retrieval has yielded many term-weighting techniques to import for our purposes; this paper implements and compares several of these, namely tf.idf, Residual IDF and Gain. We propose that term-weighting for multi-party dialogues can exploit patterns in word usage among participant speakers, and introduce the su.idf metric as one attempt to do so. Results for all metrics are reported on both manual and automatic speech recognition (ASR) transcripts, and on both the ICSI and AMI meeting corpora.

1

Introduction

The primary focus of this research is to create extractive summaries of meeting speech, in order to present users with concise and informative overviews of the content of meetings. Such extractive summaries, when incorporated into a meeting browser, can act as eﬃcient tools for navigating meeting records as a whole. This paper focuses on one fundamental component of the extractive summarization pipeline: the way that terms are weighted within a given meeting, and the bearing that various term-weighting schemes have on extraction performance. Choosing and implementing a term weighting method is often the ﬁrst step in building an automatic summarization system. Though the unit of extraction may be the sentence or the dialogue act, those units need to be weighted by the importance of their constituent words. Popular text summarization techniques such as Maximal Marginal Relevance (MMR) and Latent Semantic Analysis (LSA) begin by representing sentences as vectors of term weights. There is a wide variety of term weighting schemes available, from simple binary weights of word presence/absence to more complex weighting schemes such as tf.idf and tf.ridf. Several of these are described in the following section. A central question of this paper is whether term-weighting techniques developed for information retrieval (IR) and summarization tasks on text are wellsuited for our domain of multiparty spontaneous spoken dialogues, or whether the patterns of word usage in such dialogues can be exploited in order to yield superior term-weighting for our task. To this end, we devise and implement a novel A. Popescu-Belis, S. Renals, and H. Bourlard (Eds.): MLMI 2007, LNCS 4892, pp. 156–167, 2007. c Springer-Verlag Berlin Heidelberg 2007

Term-Weighting for Summarization of Multi-party Spoken Dialogues

157

term-weighting approach for multi-party speech called su.idf, based on diﬀering word frequencies among speakers in a meeting. This metric is compared with 3 popular term-weighting schemes - tf.idf, ridf and Gain - and the metrics are evaluated via an extractive summarization task on both AMI and ICSI corpora.

2

Previous Term Weighting Work

Term weighting methods form an essential part of any IR system. Terms that characterize a given document well and discriminate the document from

Data Loading...

Term-Weighting for Summarization of Multi-party Spoken Dialogues

Recommend Documents

Dialogues

Summarization

Text Summarization Challenge: An Evaluation Program for Text Summarization

Text Summarization

Multiparty Key Agreement for Wireless Networks

Dataset for Automatic Summarization of Russian News

Binary AMD Circuits from Secure Multiparty Computation

Application-Scale Secure Multiparty Computation

Spoken Language Dialogue Models

Video Summarization

Text Summarization

Expatriate Management Transatlantic Dialogues