Parallelization of the self-organized maps algorithm for federated learning on distributed sources

  • PDF / 1,637,166 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 90 Downloads / 125 Views

DOWNLOAD

REPORT


Parallelization of the self‑organized maps algorithm for federated learning on distributed sources Ivan Kholod1   · Andrey Rukavitsyn1 · Alexey Paznikov1 · Sergei Gorlatch2 Accepted: 2 November 2020 © Springer Science+Business Media, LLC, part of Springer Nature 2020

Abstract This paper describes a formally based approach for parallelizing the Kohonen algorithm used for the federated learning process in a special kind of neural networks— Self-Organizing Maps. Our approach enables executing the parallel algorithm version on the distributed data sources, taking into account the kind of data distribution on the nodes. Compared to the traditional approaches, we distinguish two kinds of data distributions—horizontal and vertical: for both, our suggested approach avoids gathering data in a single storage, but rather moves computations nearer to the data source nodes. This reduces the execution time of the algorithm, the network traffic, and the risk of an unauthorized access to the data during their transmission. Our experimental evaluation demonstrates the advantages of the approach. Keywords  Self-Organizing Maps (SOM) · Neural networks · Distributed data · Federated learning · Kohonen algorithm

1 Introduction Many companies currently organize their work in a data-driven manner, i.e., they employ data from various sources to optimize their business. This brings the necessity to build platforms for data processing, which include machine learning, * Ivan Kholod [email protected] Andrey Rukavitsyn [email protected] Alexey Paznikov [email protected] Sergei Gorlatch gorlatch@uni‑muenster.de 1

Saint Petersburg Electrotechnical University ”LETI”, Saint Petersburg, Russia

2

University of Muenster, Muenster, Germany



13

Vol.:(0123456789)



I. Kholod et al.

enterprise data warehouses, data clouds, etc. A typical architecture of data processing platforms includes, as in [1]: – data sources, which contain domain-oriented data; – platform, which gathers and processes all data; – consumers, which solve different business data-driven tasks. Figure 1 shows an example platform that processes data from distributed sources. There are two possible kinds of data distributions used in the business domains: – horizontal distribution shown in Fig.  1a: data sources are related to the same business domain and contain the data about different facts about this domain; – vertical distribution as shown in Fig.  1b: data sources are related to different business domains and contain data about the same facts about those domains. A data platform in Fig. 1 receives data from data sources. Its goals are: – receiving data from the data sources from same or different domains; – enriching and transforming the source data into trustworthy data that allow for addressing the needs of diverse consumers; – providing services (including data analysis based on the data sets) to the broad community of consumers. This current organization of data processing platforms has some weaknesses; in particular, it leads to an increase in total processing time, intensi