Curating a Document Collection via Crowdsourcing with Pundit 2.0

Pundit 2.0 is a semantic web annotation system that supports users in creating structured data on top of web pages. Annotations in Pundit are RDF triples that users build starting from web page elements, as text or images. Annotations can be made public a

  • PDF / 1,751,533 Bytes
  • 5 Pages / 439.37 x 666.142 pts Page_size
  • 9 Downloads / 208 Views

DOWNLOAD

REPORT


Universit` a Politecnica delle Marche, Ancona, Italy [email protected] 2 NET7 Internet Open Solutions, Pisa, Italy

Abstract. Pundit 2.0 is a semantic web annotation system that supports users in creating structured data on top of web pages. Annotations in Pundit are RDF triples that users build starting from web page elements, as text or images. Annotations can be made public and developers can access and combine them into RDF knowledge graphs, while authorship of each triple is always retrievable. In this demo we showcase Pundit 2.0 and demonstrate how it can be used to enhance a digital library, by providing a data crowdsourcing platform. Pundit enables users to annotate different kind of entities and to contribute to the collaborative creation of a knowledge graph. This, in turn, refines in real-time the exploration functionalities of the library’s faceted search, providing an immediate added value out of the annotation effort. Ad-hoc configurations can be used to drive specific visualisations, like the timeline-map shown in this demo. Keywords: Semantic annotation Digital humanities · Pundit

1

·

Linked data

·

Faceted browsing

·

Introduction

Digital libraries need curated semantically structured data to provide meaningful exploration and search capabilities. However, while metadata, such as document title, authors and main topics, are usually present and well curated in digital libraries, there is a great amount of knowledge hidden in texts and that could be of great value to explore a corpus. Although automatic text annotation services are available and their performances greatly improved over the last years, there is still the need for human intervention to refine extracted data and to add information than can hardly be captured by automatic tools. Pundit1 is a semantic annotation tool that combines powerful annotation functionalities, covering comments; tagging; semi-automatic entities markup and linking; composition of rich semantic statements by interlinking items in a web page - such as text or images - and resources from the LOD or from custom annotation vocabularies. Annotations in Pundit can be made public and then accessed - via REST 1

http://thepund.it.

c Springer International Publishing Switzerland 2015  F. Gandon et al. (Eds.): ESWC 2015, LNCS 9341, pp. 102–106, 2015. DOI: 10.1007/978-3-319-25639-9 20

Curating a Document Collection via Crowdsourcing with Pundit 2.0

103

APIS or SPARQL queries - and combined by developers to form RDF knowledge graphs. The tool adopts a flexible data model based on RDF and an extension of the Open Annotation model2 . Pundit is the evolution of previous systems [1,2] and is designed as a configurable annotation service. It addresses online annotation communities by allowing customisation of both user interface - by activating/deactivating annotation functionalities - and annotation vocabularies, allowing community administrators to decide what properties and resources can be used in composing annotations. Domain specific annotation environments can b