Text Segmentation Model Based LDA and Ontology for Question Answering in Agriculture

Question answering system based on text collections has been one research focus in information technology. The significant problem for text collections was how to construct models for text and segmentations. An approach to building topic models based on a

  • PDF / 449,315 Bytes
  • 13 Pages / 439.37 x 666.142 pts Page_size
  • 72 Downloads / 152 Views

DOWNLOAD

REPORT


Text Segmentation Model Based LDA and Ontology for Question Answering in Agriculture Depeng Hu, Wensheng Wang, Shuyu Liu, Nengfu Xie, and GuoWei Yin

Abstract Question answering system based on text collections has been one research focus in information technology. The significant problem for text collections was how to construct models for text and segmentations. An approach to building topic models based on a formal generative model of documents, Latent Dirichlet Allocation (LDA), is heavily cited in the machine learning literature, but its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use LDA to improve answer retrieval. We propose an LDA-based segmentation model within the language modelling framework, and evaluate it on text segmentation collections for agriculture cultivation. Gibbs sampling is employed to conduct approximate inference in LDA and the computational complexity is analyzed. The process of generating answers in agriculture cultivation question answering system (ACQA_onto) was presented. We demonstrate LDA’s improved expressiveness over traditional QA system based information retrieval with visualizations of answers accurate. Keywords Text segmentation • LDA • Domain ontology • Question answering system • Agriculture cultivation

D. Hu (*) Institute of agriculture information, Chinese Academy of Agriculture Sciences, Beijing 100081, China Graduate School, Chinese Academy of Agricultural Sciences, Beijing 100081, China W. Wang • N. Xie • G. Yin Institute of agriculture information, Chinese Academy of Agriculture Sciences, Beijing 100081, China S. Liu Information School, Liaocheng Vocational and Technical College, Liaocheng, Shangdong 252000, People’s Republic of China S. Xu (ed.), Proceedings of 2013 World Agricultural Outlook Conference, DOI 10.1007/978-3-642-54389-0_27, © Springer-Verlag Berlin Heidelberg 2014

307

308

27.1

D. Hu et al.

Introduction

Automated question answering (QA) has become an interesting research field since AI applications. In the classical question answering system, there need to construct knowledge database and rule corpora for reasoning. Computing power has strongly increased, and the general methodology has changed from use of hand-encoded knowledge bases about simple domain to the use of text collections as the main knowledge source over more complex domains (Question answering in restricted domains an overview). Today’s knowledge databases have immensely benefited from many sophisticated information resources, such as web pages, blogs, scientific articles et al. Standard text mining and information retrieval techniques usually rely on word matching and do not take into account the similarity of words and structure of documents within the corpus. When working with large corpora of documents it is difficult to comprehend and process all the information contained in them. Building document models is a critical part of any approach to information processing. Typically, documents are represented as