A Scalable Grid Computing Framework for Extensible Phylogenetic Profile Construction

Current research in Life Sciences without doubt has been established as a Big Data discipline. Beyond the expected domain-specific requirements, this perspective has put scalability as one of the most crucial aspects of any state-of-the-art bioinformatics

PDF / 998,434 Bytes
8 Pages / 439.37 x 666.142 pts Page_size
63 Downloads / 344 Views

DOWNLOAD

REPORT

Abstract. Current research in Life Sciences without doubt has been established as a Big Data discipline. Beyond the expected domain-speciﬁc requirements, this perspective has put scalability as one of the most crucial aspects of any state-of-the-art bioinformatics framework. Sequence alignment and construction of phylogenetic proﬁles are common tasks evident in a wide range of life science analyses as, given an arbitrary big volume of genomes, they can provide useful insights on the functionality and relationships of the involved entities. This process is often a computational bottleneck in existing solutions, due to its inherent complexity. Our proposed distributed framework manages to perform both tasks with signiﬁcant speed-up by employing Grid Computing resources provided by EGI in an eﬃcient and optimal manner. The overall workﬂow is both fully automated, thus making it user friendly, and fully detached from the end-users terminal, since all computations take place on Grid worker nodes.

1

Introduction

Over the last decade, the amount of available data in the life sciences domain has increased exponentially and is expected to keep growing at an ever accelerating pace. This signiﬁcant increase in data acquisition leads to a pressing need for scalable methods that can be employed to interpret them; a scaling that cannot be met by traditional systems as they cannot provide the necessary computational power and network throughput required. Several eﬀorts are evident in recent literature towards developing new, distributed methods for a number of bioinformatics workﬂows through the use of HPC systems and paradigms, such as MapReduce [6]. However most, if not all, of these eﬀorts necessitate the setup of a rather complex computing system, as well as the expertise to manage and update an independent software project, since most implementations radically diﬀer from their vanilla counterparts. This is in stark contrast with the situation for most life science researchers who lack the expertise needed to use and manage those systems. As a result, and despite the overall advantages of these frameworks, their ultimate use is fairly limited. c IFIP International Federation for Information Processing 2016 Published by Springer International Publishing Switzerland 2016. All Rights Reserved L. Iliadis and I. Maglogiannis (Eds.): AIAI 2016, IFIP AICT 475, pp. 455–462, 2016. DOI: 10.1007/978-3-319-44944-9 39

456

E. Stergiadis et al.

In order to overcome these issues, while at the same time providing the much needed computing power for complex analyses, we developed a bioinformatics framework on top of a Grid architecture that is able to perform common comparative genomics workﬂows at a massive scale using EGI resources. Special care has been taken to make the framework as automated as possible, increasing the user friendly factor in order to further facilitate wider use by the scientiﬁc community. Moreover, every major submodule in the framework utilizes the latest vanilla version available to the community, in order to

Data Loading...

A Scalable Grid Computing Framework for Extensible Phylogenetic Profile Construction

Recommend Documents

Scalable Smart Grid Communication

Scalable Classification Tree Construction

Scalable Decision Tree Construction

Grid Computing

A scalable semantic data fusion framework for heterogeneous sensors data

Geospatial Computing Grid

Grid Computing Towards a Global Interconnected Infrastructure

A Secure and Flexible Data Aggregation Framework for Smart Grid

Scalable Top-Down Decision Tree Construction

Extensible Markup Language

Roomba: An Extensible Framework to Validate and Build Dataset Profiles

Virtual Infrastructure Management Framework for Cloud Computing