Accessibility, Adaptability, and Extendibility: Dealing with the Small Data Problem
An underserved niche exists for data mining tools in complex analytical environments. We propose three attributes of analytical tool development that facilitates rapid operationalization of new tools into complex, dynamic environments: accessibility, adap
- PDF / 171,142 Bytes
- 12 Pages / 439.37 x 666.142 pts Page_size
- 47 Downloads / 177 Views
Abstract An underserved niche exists for data mining tools in complex analytical environments. We propose three attributes of analytical tool development that facilitates rapid operationalization of new tools into complex, dynamic environments: accessibility, adaptability, and extendibility. Accessibility we define as the ability to load data into an analytical system quickly and seamlessly. Adaptability we define as the ability to apply a tool rapidly to new, unanticipated use cases. Extendibility we define as the ability to create new functionality “in the field” where it is being used and, if needed, harden that new functionality into a new, more permanent user interface. Distributed “big data” systems generally do not optimize for these attributes, creating an underserved niche for new analytical tools. In this paper we will define the problem, examine the three attributes, and describe the architecture of an example system called Citrus that we have built and use that is especially focused on these three attributes. Keywords Human factors
Text analysis Data mining Analytical tools
1 Introduction Data mining needs for national security are complex. The industry has seen analytical tool capabilities evolve quickly over the years. A decade ago, the ability to perform modest text analysis over several thousand documents on an individual desktop was considered an accomplishment and large scale distributed computing
T. Bauer (&) D. Garcia Sandia National Laboratories, Albuquerque, NM, USA e-mail: [email protected] D. Garcia e-mail: [email protected] © Springer International Publishing Switzerland 2017 I.L. Nunes (ed.), Advances in Human Factors and System Interactions, Advances in Intelligent Systems and Computing 497, DOI 10.1007/978-3-319-41956-5_20
219
220
T. Bauer and D. Garcia
required highly specialized hardware and staff with engineering degrees. Today, a high-end but still stock laptop can process millions of documents and a bright high school student can set up a basic Hadoop cluster. Current conventional wisdom has led many information technology departments serving analytical environments to focus on building large scale, distributed computational systems. There are advantages to this. Consolidating analytical capabilities into a centralized, shared location reduces the need for individual deployments of software and distribution of data. Consolidating makes the system and data easier to manage. Having a single location to “update everything” makes it easier for an IT department to deploy new capabilities on a wide scale. It might seem that this would lead to rapid deployment of new analytical capabilities. After all, if an IT department can put new capabilities in a single place and have them immediately widely available, this should improve the rapid operationalization of new analytical capabilities. On the one hand, this approach may work for certain computing technologies such as web-based email. In these kinds of situations, the solutions offered are generally “one size fits all” and ther
Data Loading...