ONOMATOPEDIA: Onomatopoeia Online Example Dictionary System Extracted from Data on the Web

Japanese is filled with onomatopoeia words, which describe sounds or actions like “click” or “bow-wow.” In general, mastering onomatopoeia phrases is hard for foreign speakers, and example-based dictionaries are known to be useful for learning Japanese on

  • PDF / 828,862 Bytes
  • 12 Pages / 430 x 660 pts Page_size
  • 78 Downloads / 152 Views

DOWNLOAD

REPORT


ract. Japanese is filled with onomatopoeia words, which describe sounds or actions like “click” or “bow-wow.” In general, mastering onomatopoeia phrases is hard for foreign speakers, and example-based dictionaries are known to be useful for learning Japanese onomatopoeia. To construct such dictionaries, we need to collect as many examples as possible. This paper proposes an online onomatopoeia example-based dictionary named ONOMATOPEDIA, which comprises extensive example sentences collected from the Web. Inappropriate sentences are often included in web search results, for example, sentences that contain onomatopoeia words used as nick-names, or sentences that include uncommon usage patterns. We propose a model for extracting appropriate sentences as learning examples. Further, we propose a clustering algorithm for sentences having onomatopoeia that takes into account onomatopoeic words that could be used in different meanings depending on the context.

1

Introduction

Onomatopoeia is a word or a grouping of words that expresses sounds, action or status directly, such as “click” and “bowbow.” Japanese is filled with onomatopoeic phrases and has more than other languages; and the phrases are widely used in news headlines, in conversation or in Manga (Japanese comic books), because they succinctly describe things perfectly. There are two categories: “giongo” and “gitaigo.” Giongo are words that express voice or sounds. Gitaigo are words that express actions, states or human emotions. Learners of Japanese language must master onomatopoeia to make their Japanese more descriptive and expressive. However, it is hard to master onomatopoeia use, even for advanced-level Japanese language learners. There are several reasons. One is that Japanese has many “gitaigo,” which express status or human emotions more than other languages. For instance, Japanese “barabara” is used to reflect an object’s state of disarray or separation, and “shiiin” is the onomatopoeia form of absolute silence. Another reason is that most onomatopoeia are rich in meaning, depending on the context in which the phrase is used. Y. Zhang et al. (Eds.): APWeb 2008, LNCS 4976, pp. 601–612, 2008. c Springer-Verlag Berlin Heidelberg 2008 

602

C. Asaga, Y. Mukarramah, and C. Watanabe

An effective way to master onomatopoeia is to read many sentences that contain onomatopoeia. We are, therefore, developing an online onomatopoeia example-based dictionary named ONOMATOPEDIA, which has extensive example sentences collected from the Web. This system targets advanced-level learners who can already communicate in Japanese, so the example sentences are written only in Japanese. Example sentences are collected by a search engine API using onomatopoeic phrases as the search keyword. In this paper, we describe two important techniques for generating a goodquality onomatopoeia example-based dictionary: they are collecting appropriate sentences from the Web, and organizing them by onomatopoeic meaning. Inappropriate sentences tend to be collected from search engines using ono