Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining tech
- PDF / 4,537,337 Bytes
- 18 Pages / 595.276 x 790.866 pts Page_size
- 3 Downloads / 249 Views
RESEARCH ARTICLE
Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques Qinjun Qiu 1,2 & Zhong Xie 1,2 & Liang Wu 1,2 & Liufeng Tao 1,2 Received: 15 July 2020 / Accepted: 15 September 2020 # Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract A large number of georeferenced quantitative data about rock and geoscience surveys are buried in geological documents and remain unused. Data analytics and information extraction offer opportunities to use this data for improved understanding of ore forming processes and to enhance our knowledge. Extracting spatiotemporal and semantic information from a set of geological documents enables us to develop a rich representation of the geoscience knowledge recorded in unstructured text written in Chinese. This paper presents the workflow for spatiotemporal and semantic information extraction, which is a geological document analysis approach that uses automated techniques for browsing and searching relevant geological content. The developed workflow applies spatial and temporal gazetteer matching, pattern-based rules and spatiotemporal relationship extraction to identify and label terms in geological text documents. It offers a representation of contextual information in knowledge graph form, extracts a set of relevant tables and figures, and queries a list of relevant documents by using geological topic information. Here, text mining techniques are used to facilitate the analysis of geological knowledge and to show the effectiveness of text analysis for improving the rapid assessment of a massive number of documents. Furthermore, autogenerated keyword suggestions derived from extracted keyword associations are used to reduce document search efforts. This research illustrates the usefulness and effectiveness of the developed information extraction workflow and demonstrates the potential of incorporating text mining and NLP techniques for geoscience. Keywords Geoscience document . Knowledge graph . Geological text mining . Natural language processing
Introduction Publicly large geoscience documents/reports are components of available data sources and offer tremendous challenges and opportunities, as they can enable geology research in a chosen target area. These natural language documents/reports often contain a large amount of explicit and implicit geological knowledge pertaining to ore forming processes or documenting where geological structures occurred (Holden et al. 2019). Research on mathematical geoscience aims to
Communicated by: H. Babaie * Liufeng Tao [email protected] 1
School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China
2
National Engineering Research Center of Geographic Information System, Wuhan 430074, China
process georeferenced quantitative information/data for information extraction and knowledge discovery (Lima et al. 2017; Wang et al. 2018a, 2018b). Given the content in geological documents, automatically extracting
Data Loading...