A deep Q-learning portfolio management framework for the cryptocurrency market

  • PDF / 1,507,321 Bytes
  • 16 Pages / 595.276 x 790.866 pts Page_size
  • 38 Downloads / 167 Views

DOWNLOAD

REPORT


(0123456789().,-volV)(0123456789(). ,- volV)

S.I. : EMERGING APPLICATIONS OF DEEP LEARNING AND SPIKING ANN

A deep Q-learning portfolio management framework for the cryptocurrency market Giorgio Lucarelli1 • Matteo Borrotti1,2 Received: 21 November 2019 / Accepted: 9 September 2020 Ó The Author(s) 2020

Abstract Deep reinforcement learning is gaining popularity in many different fields. An interesting sector is related to the definition of dynamic decision-making systems. A possible example is dynamic portfolio optimization, where an agent has to continuously reallocate an amount of fund into a number of different financial assets with the final goal of maximizing return and minimizing risk. In this work, a novel deep Q-learning portfolio management framework is proposed. The framework is composed by two elements: a set of local agents that learn assets behaviours and a global agent that describes the global reward function. The framework is tested on a crypto portfolio composed by four cryptocurrencies. Based on our results, the deep reinforcement portfolio management framework has proven to be a promising approach for dynamic portfolio optimization. Keywords Deep reinforcement learning  Q-learning  Portfolio management  Dueling double deep Q-networks

1 Introduction Nowadays, new developments in Machine Learning (ML) and advancements in neuroscience together with an increasing amount of data and a new generation of computers are ushering in a new age of Artificial Intelligence (AI). Currently, AI researchers possess a rising interest in a collection of powerful techniques that fall under the umbrella of deep Reinforcement Learning (RL) [4]. The success of deep RL is related to the fact that both biological and artificial agents must achieve goals to survive and be useful. This goal-oriented behaviour is the milestone of RL. Such behaviour is based on learning actions that maximize rewards and minimize punishments or losses. RL relies on interactions between an agent and its environment. The agent must choose actions based on a set

& Matteo Borrotti [email protected] 1

Department of Economics, Management and Statistics, University of Milano-Bicocca, Piazza dell’Ateneo Nuovo, 1, 20126 Milan, Italy

2

Institute for Applied Mathematics and Information Technologies, National Research Council, Via Alfonso Corti, 12, 20133 Milan, Italy

of inputs, where the inputs define the states of the environment. The agent tries to optimize the outcomes of these actions over time, which can be either rewards or punishments. This formulation is natural in biological systems, but it has also proven to be highly useful for artificial agents [22]. In fact, the combination of representation learning with goal-oriented behaviour gives deep RL an inherent interest for many different applications. Deep RL approaches have been successfully applied to a range of fields that vary from image understanding [6] to natural language processing [32]. For example, deep RL has been widely proposed to tackle cyber attacks against Intern