Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets

PDF / 1,920,644 Bytes
21 Pages / 547.087 x 737.008 pts Page_size
71 Downloads / 338 Views

Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets M. Ghorbani & S. Swift & S. J. E. Taylor & A. M. Payne

Received: 26 June 2018 / Accepted: 23 March 2020 # The Author(s) 2020

Abstract The generation of a feature matrix is the first step in conducting machine learning analyses on complex data sets such as those containing DNA, RNA or protein sequences. These matrices contain information for each object which have to be identified using complex algorithms to interrogate the data. They are normally generated by combining the results of running such algorithms across various datasets from different and distributed data sources. Thus for non-computing experts the generation of such matrices prove a barrier to employing machine learning techniques. Further since datasets are becoming larger this barrier is augmented by the limitations of the single personal computer most often used by investigators to carry out such analyses. Here we propose a user friendly system to generate feature matrices in a way that is flexible, scalable and extendable. Additionally by making use of The Berkeley Open Infrastructure for Network Computing (BOINC) software, the process can be speeded up using distributed volunteer computing possible in most institutions. The system makes use of a combination of the Grid and Cloud User Support Environment (gUSE), combined with the Web Services Parallel Grid Runtime and Developer Environment Portal (WS-PGRADE) to create workflow-based science gateways that allow users to submit work to the distributed computing. This M. Ghorbani : S. Swift : S. J. E. Taylor : A. M. Payne (*) Department of Computer Science, College of Engineering, Design and Physical Sciences, Brunel University London, Kingston Lane, Uxbridge, Middx UB8 3PH, UK e-mail: [email protected]

report demonstrates the use of our proposed WSPGRADE/gUSE BOINC system to identify features to populate matrices from very large DNA sequence data repositories, however we propose that this system could be used to analyse a wide variety of feature sets including image, numerical and text data. Keywords BOINC . Desktop grid . DNA sequence . Feature subset selection . Machine learning . High performance computing . WS-PGRADE . gUSE . DNA feature identification . DNA sequence . Speedup

1 INTRODUCTION Machine learning techniques have proved to be important tools in many research areas to aid knowledge discovery from complex data sets. Examples of its far reaching impact and methods have been extensively reported [1–3]. Machine learning analysis however is preceded by the important stage of feature matrix generation which selects the features to be analyzed from these data sets. In some cases these features can be simply a chosen subset of features in the data set; chosen using expert knowledge of the subject arena the data was collected from. Often however the features are generated by running algorithms across the data to draw out derived features or values not in the original data s

Data Loading...

Design of a Flexible, User Friendly Feature Matrix Generation System and its Application on Biomedical Datasets

Recommend Documents

Stavex: A Sophisticated but User-Friendly Expert System for the Design and Analysis of Experiments

Biomedical Application of Cotton and Its Derivatives

User Interface, System Design

Meta Learning on Small Biomedical Datasets

Towards user-friendly OR: a Chinese experience

Synthesis and Characterization of Nanocrystalline Diamond and Its Biomedical Application

Design of Flexible DC Distribution Control System

OBLOGIS, A Flexible Flavor Implementation of PROLOG Logic, and Its Application to the Design of a Broaching Expert Syste

User-Friendly Grid Security Architecture and Protocols

User-Friendly Grid Security Architecture and Protocols

Selective laser sintering and its application in biomedical engineering

Sinc Sum Function and Its Application on FIR Filter Design