Semantic Context Detection Using Audio Event Fusion

PDF / 455,759 Bytes
12 Pages / 600.03 x 792 pts Page_size
69 Downloads / 297 Views

Semantic Context Detection Using Audio Event Fusion: Camera-Ready Version Wei-Ta Chu,1 Wen-Huang Cheng,2 and Ja-Ling Wu1, 2 1 Department 2 Graduate

of Computer Science and Information Engineering, National Taiwan University, Taipei 106, Taiwan Institute of Networking and Multimedia, National Taiwan University, Taipei 106, Taiwan

Received 31 August 2004; Revised 20 February 2005; Accepted 5 April 2005 Semantic-level content analysis is a crucial issue in achieving eﬃcient content retrieval and management. We propose a hierarchical approach that models audio events over a time series in order to accomplish semantic context detection. Two levels of modeling, audio event and semantic context modeling, are devised to bridge the gap between physical audio features and semantic concepts. In this work, hidden Markov models (HMMs) are used to model four representative audio events, that is, gunshot, explosion, engine, and car braking, in action movies. At the semantic context level, generative (ergodic hidden Markov model) and discriminative (support vector machine (SVM)) approaches are investigated to fuse the characteristics and correlations among audio events, which provide cues for detecting gunplay and car-chasing scenes. The experimental results demonstrate the eﬀectiveness of the proposed approaches and provide a preliminary framework for information mining by using audio characteristics. Copyright © 2006 Hindawi Publishing Corporation. All rights reserved.

1.

INTRODUCTION

As the rapid advance in media creation, storage, and compression technologies, large amounts of multimedia content have been created and disseminated by various ways. Massive multimedia data challenge users in content browsing and retrieving, thereby motivating the urging needs of information mining technologies. To facilitate eﬀective or eﬃcient multimedia document indexing, many research issues have been investigated. Shot boundary detection algorithms are amply studied [1, 2] to discover the structure of video. With the understanding of video structure, video adaptation applications [3] are then developed to manipulate information more flexibly. Moreover, techniques for genre classification are also investigated to facilitate browsing and retrieval. Audio classification and segmentation techniques [4, 5] are proposed to discriminate diﬀerent types of audio, such as speech, music, noise, and silence. Additional work focuses on classifying musical sounds [6] and automatically constructing music snippets [7]. For video content, genres of films [8] and TV programs [9] are automatically classified by exploring various features. Features from audio, video, and text [10] could be exploited to perform content analysis, and multimodal approaches are proposed to eﬃciently cope with the access and retrieval issues of multimedia content. On the basis of physical features, the paradigms described above are developed to automatically analyze multimedia

content. However, they pose many problems in today’s applications. The semantic gap between low-level fe

Data Loading...

Semantic Context Detection Using Audio Event Fusion

Recommend Documents

Detection and Separation of Speech Event Using Audio and Video Information Fusion and Its Application to Robust Speech I

Semantic Inference in Audio

Audio-Visual Fusion

Emotion Detection from Audio Using SVM

Leveraging Statistic and Semantic Features for Similar Question Detection Using Fusion XGBoost

Event Detection

Hippocampus Atrophy Detection Using Hybrid Semantic Categorization

Structural Damage Semantic Segmentation Using Dual-Network Fusion

Video Captioning Using Attention Based Visual Fusion with Bi-temporal Context and Bi-modal Semantic Feature Learning

Molecular Event Detection

Event Detection and Reporting

Event Pattern Detection