Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrat

A good deal of digital data produced in academia, commerce and industry is made up of a raw, unstructured text, such as Word documents, Excel tables, emails, web pages, etc., which are also often represented in a natural language. An important analytical

PDF / 528,200 Bytes
13 Pages / 439.363 x 666.131 pts Page_size
29 Downloads / 262 Views

DOWNLOAD

REPORT

High-Performance Computing Center Stuttgart, Nobelstr. 19, 70569 Stuttgart, Germany {cheptsov,tenschert}@hlrs.de 2 Institute of the Society for the Promotion of Applied Information Sciences at the Saarland University, Martin-Luther-Str. 14, 66111 Saarbrücken, Germany [email protected] 3 University of Ulm, Institute of Artificial Intelligence, 89069 Ulm, Germany [email protected] 4 Objectivity, Inc., 3099 North First Street, Suite 200 San Jose, CA 95134 USA [email protected] 5 derivo GmbH, James-Franck-Ring, 89081 Ulm, Germany [email protected]

Abstract. A good deal of digital data produced in academia, commerce and industry is made up of a raw, unstructured text, such as Word documents, Excel tables, emails, web pages, etc., which are also often represented in a natural language. An important analytical task in a number of scientific and technological domains is to retrieve information from text data, aiming to get a deeper insight into the content represented by the data in order to obtain some useful, often not explicitly stated knowledge and facts, related to a particular domain of interest. The major challenge is the size, structural complexity, and frequency of the analysed text sets’ updates (i.e., the ‘big data’ aspect), which makes the use of traditional analysis techniques and tools impossible. We introduce an innovative approach to analyse unstructured text data. This allows for improving traditional data mining techniques by adopting algorithms from ontological domain modelling, natural language processing, and machine learning. The technique is inherently designed with parallelism in mind, which allows for high performance on large-scale Cloud computing infrastructures. Keywords: Data-as-a-Service, Text Mining, Ontology Modelling, Cloud computing.

1

Introduction

The modern IT technologies are increasingly getting data-centric, fostered by the broad availability of data acquisition, collection and storing platforms. The concepts Z. Huang et al. (Eds.): WISE 2013 Workshops 2013, LNCS 8182, pp. 62–74, 2014. © Springer-Verlag Berlin Heidelberg 2014

Introducing a New Scalable Data-as-a-Service Cloud Platform

63

of linked and open data have enabled a principally new dimension of data analysis, which is no longer limited to internal document collections, i.e., “local data”, but comprises a number of heterogeneous data sources, in particular from the Web, i.e., “global data”. However, existing data processing and analysis technologies are still far from being able to scale to demands of global and, in case of large industrial corporations, even of local data, which makes up the core of the “big data” problem. With regard to this, the design of the current data analysis algorithms requires to be reconsidered in order to enable the scalability to big data demands. The problem has two major aspects: (1) the solid design of current algorithms makes the integration with other techniques that would help increase the analysis quality impossible, and (2) sequential design of the algorithms pr

Data Loading...

Introducing a New Scalable Data-as-a-Service Cloud Platform for Enriching Traditional Text Mining Techniques by Integrat

Recommend Documents

A Platform for Peptidase Detection Based on Text Mining Techniques and Support Vector Machines

A Traditional Analysis for Efficient Data Mining with Integrated Association Mining into Regression Techniques

Introducing a Field Service Platform

Introducing the Splunk Platform

New Complexity Scalable MPEG Encoding Techniques for Mobile Applications

Introducing a New Supply Chain Management Concept by Hybridizing TOPSIS, IoT and Cloud Computing

Scalable IC Platform for Smart Cameras

Scalable Techniques for Formal Verification

Efficient text summarization method for blind people using text mining techniques

Text Mining

Enriching the Semantics of Temporal Relations for Temporal Pattern Mining

Techniques for Preserving Privacy in Data Mining for Cloud Storage: A Survey