JS4Geo: a canonical JSON Schema for geographic data suitable to NoSQL databases
- PDF / 4,101,990 Bytes
- 33 Pages / 439.642 x 666.49 pts Page_size
- 111 Downloads / 233 Views
JS4Geo: a canonical JSON Schema for geographic data suitable to NoSQL databases Angelo A. Frozza1
· Ronaldo dos S. Mello2
Received: 1 September 2019 / Revised: 28 April 2020 / Accepted: 18 May 2020 / © Springer Science+Business Media, LLC, part of Springer Nature 2020
Abstract The large volume and variety of data produced in the current Big Data era lead companies to seek solutions for the efficient data management. Within this context, NoSQL databases rise as a better alternative to the traditional relational databases, mainly in terms of scalability and availability of data. A usual feature of NoSQL databases is to be schemaless, i.e., they do not impose a schema or have a flexible schema. This is interesting for systems that deal with complex data, such as GIS. However, the lack of a schema becomes a problem when applications need to perform processes such as data validation, data integration, or data interoperability, as there is no pattern for schema representation in NoSQL databases. On the other hand, the JSON language stands out as a standard for representing and exchanging data in document NoSQL databases, and JSON Schema is a schema representation language for JSON documents that it is also leading to become a standard. However, it does not include spatial data types. From this limitation, this paper proposes an extension to JSON Schema, called JS4Geo, that allows the definition of schemas for geographic data. We demonstrate that JS4Geo is able to represent schemas of any NoSQL data model, as well as other standards for geographic data, like GML and KML. We also present a case study that shows how a data integration system can benefit of JS4Geo to define local schemas for geographic datasets and generate an integrated global schema. Keywords Geographic data · NoSQL · JSON · JSON Schema · GeoJSON · JS4Geo
1 Introduction We live today in the so-called Big Data era, where large volumes of digital data are produced at a very high speed, stored in a distributed way and shared in different formats [21]. In this Angelo A. Frozza
[email protected] Ronaldo dos S. Mello [email protected] 1
Instituto Federal Catarinense - IFC, Santa Catarina, Brazil
2
Universidade Federal de Santa Catarina - UFSC, Florian´opolis, Brazil
Geoinformatica
context, the Web 2.0 is responsible to the appearance and growth of several applications that process Big Data, such as social networks, Internet of Things (IoT), e-commerce and open data platforms [14, 22]. A lot of these applications make intensive use of geographic data (spatial data and other features of relevant geo-referenced real-world entities), such as Twitter, Facebook and Open Street Maps, among others [3]. Due to the inherent heterogeneity in data representation of these applications, companies are investing in a recent family of technologies for Big Data management called NoSQL [25]. These database systems are based on different data models, and their main features are the ability to represent complex data, scalability to manage both large datasets and in
Data Loading...