Thesaurus matching in electronic commerce

  • PDF / 1,599,573 Bytes
  • 26 Pages / 439.37 x 666.142 pts Page_size
  • 34 Downloads / 227 Views

DOWNLOAD

REPORT


Thesaurus matching in electronic commerce Thomas Cerqueus1   · Jonathan Bonnaud1 · Oleksandr Dashkov1 · Emmanuel Morin2 Accepted: 28 August 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract This paper tackles the problem of e-commerce thesauri alignment. It includes the definition of three alignment techniques which can be combined to increase the effectiveness and reduce the execution time. It also introduces a filtering technique to reduce the number of candidates returned to the final user. This work reports a set of evaluations that were lead with real-world data. Results show that the proposed techniques outperform schema, the state-of-the-art approach. They also drastically reduce the execution time, thus making them more usable in real-world applications. Keywords  Thesaurus · Category · Matching · Alignment · E-commerce

1 Introduction Electronic commerce has become unavoidable for retailers. One can send her products on various platforms such as online shopping web sites, marketplaces and comparison shopping engines. Major platforms categorise products in accordance with a thesaurus, that is a tree-like structure of categories. Generally, a retailer sells her products on her own online shopping web site (using her own thesaurus) and then targets third party platforms, also called channels (e.g., Google Shopping,1 Amazon2 and Criteo.3) 1

  https​://www.googl​e.com/shopp​ing.   https​://www.amazo​n.com. 3   http://www.crite​o.com. 2

* Thomas Cerqueus [email protected] Jonathan Bonnaud [email protected] Oleksandr Dashkov [email protected] Emmanuel Morin emmanuel.morin@univ‑nantes.fr 1

Lengow, Nantes, France

2

LS2N - University of Nantes, CNRS, Nantes, France



13

Vol.:(0123456789)



T. Cerqueus et al.

In order to send her products, while complying with the specificities of the channels, the retailer has to categorise her products in accordance with the channels’ thesaurus. This task is not practicable when the number of products reaches thousands. A better approach consists in aligning the retailer’s and the channel’s thesauri. The task of aligning e-commerce thesauri is complex, as it requires retailers to find for each of her categories, the most similar category in the channel thesaurus. In this work, we aim at defining a solution to semi-automatically align thesauri. We do not look for a completely automatic solution, as we think it is crucial that retailers keep the decision-making power. The retailers must attest that the alignment fits their requirements. We also aim at defining a threshold-free approach. At least, we want to avoid leaving the user of our solution with the burdensome decision to set threshold values (often dataset-dependent). Finally, we want to limit the number of candidates presented to retailers in order not to overwhelm them with many inaccurate categories. Many efforts have been dedicated to the problem of ontology and schema alignment [4, 9, 18]. The approaches defined in this field could be used for the a