Publishing Without Publishers: A Decentralized Approach to Dissemination, Retrieval, and Archiving of Data

Making available and archiving scientific results is for the most part still considered the task of classical publishing companies, despite the fact that classical forms of publishing centered around printed narrative articles no longer seem well-suited i

  • PDF / 2,460,082 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 113 Downloads / 162 Views

DOWNLOAD

REPORT


5

Department of Humanities, Social and Political Sciences, ETH Zurich, Z¨ urich, Switzerland [email protected] 2 Department of Computer Science, VU University Amsterdam, Amsterdam, The Netherlands 3 Swiss Institute of Bioinformatics, Geneva, Switzerland [email protected] 4 Yale University School of Medicine, New Haven, CT, USA [email protected] Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA [email protected]

Abstract. Making available and archiving scientific results is for the most part still considered the task of classical publishing companies, despite the fact that classical forms of publishing centered around printed narrative articles no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. Here we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. Based on a novel combination of existing concepts and technologies, we present a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data. We show how this approach allows researchers to publish, retrieve, verify, and recombine datasets of nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used for the Semantic Web in general. Evaluation of the current small network shows that this system is efficient and reliable.

1

Introduction

Modern science increasingly depends on datasets, which however are left out in the classical way of publishing, i.e. through narrative (printed or online) articles in journals or conference proceedings. This means that the publications that describe scientific findings get disconnected from the data they are based on, c Springer International Publishing Switzerland 2015  M. Arenas et al. (Eds.): ISWC 2015, Part I, LNCS 9366, pp. 656–672, 2015. DOI: 10.1007/978-3-319-25007-6 38

Publishing Without Publishers: A Decentralized Approach

657

which can seriously impair the verifiability and reproducibility of their results. Addressing this issue raises a number of practical problems: How should one publish scientific datasets and how can one refer to them in the respective scientific publications? How can we be sure that the data will remain available in the future and how can we be sure that data we find on the Web have not been corrupted or tampered with? Moreover, how can we refer to specific entries or subsets from large datasets? To address some of these problems, a number of scientific data repositories have appeared, such as Figshare and Dryad.1 Furthermore, Digital Object Identifiers (DOI) have been advocated to be used not only for articles but also for scientific data [22]. While these services certainly improve the situation of scientific data, in particular when combined with