Querying subjective data

  • PDF / 1,330,658 Bytes
  • 26 Pages / 595.276 x 790.866 pts Page_size
  • 56 Downloads / 225 Views

DOWNLOAD

REPORT


SPECIAL ISSUE PAPER

Querying subjective data Yuliang Li1 · Aaron Feng1 · Jinfeng Li1 · Shuwei Chen1 · Saran Mumick2 · Alon Halevy3 · Vivian Li1 · Wang-Chiew Tan1 Received: 31 January 2020 / Revised: 4 August 2020 / Accepted: 28 August 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Online users are constantly seeking experiences, such as a hotel with clean rooms and a lively bar, or a restaurant for a romantic rendezvous. However, e-commerce search engines only support queries involving objective attributes such as location, price, and cuisine, and any experiential data is relegated to text reviews. In order to support experiential queries, a database system needs to model subjective data. Users should be able to pose queries that specify subjective experiences using their own words, in addition to conditions on the usual objective attributes. This paper introduces OpineDB, a subjective database system that addresses these challenges. We introduce a data model for subjective databases. We describe how OpineDB translates subjective queries against the subjective database schema, which is done by matching the user query phrases to the underlying schema. We also show how the experiential conditions specified by the user can be combined and the results aggregated and ranked. We demonstrate that subjective databases satisfy user needs more effectively and accurately than alternative techniques through experiments with real data of hotel and restaurant reviews. Keywords Subjective data · Opinion mining/extraction · Text databases · Natural language processing

1 Introduction Database systems model entities in a domain with a set of attributes. Typically, these attributes are objective in the sense

B

Yuliang Li [email protected] Aaron Feng [email protected] Jinfeng Li [email protected] Shuwei Chen [email protected] Saran Mumick [email protected]; [email protected] Alon Halevy [email protected] Vivian Li [email protected] Wang-Chiew Tan [email protected]

1

Megagon Labs, Mountain View, USA

2

University of Pennsylvania, Philadelphia, USA

3

Facebook AI, California, USA

that they have an unambiguous value for a given entity, even if the value is unknown to the database, known only probabilistically, or recorded erroneously. Typical examples of such attributes include product specifications, details of a purchase order, or values of sensor readings. The Boolean nature of database query languages reinforces the primacy of objective data—a tuple is either in the answer to the query or is not, but cannot be anywhere in between. However, the world also abounds with subjective attributes for which there is no unambiguous value and are of great interest to users. Examples of such attributes occur in a variety of domains, including the cleanliness of hotel rooms, the difficulty level of an online course, or whether a restaurant is romantic. Currently, the data for these attributes, when it exists, is typically left in text reviews or social media, but not modeled in the database and th