A deep q-learning-based optimization of the inventory control in a linear process chain

PDF / 3,334,950 Bytes
9 Pages / 595.276 x 790.866 pts Page_size
64 Downloads / 172 Views

PRODUCTION MANAGEMENT

A deep q‑learning‑based optimization of the inventory control in a linear process chain M.‑A. Dittrich1 · S. Fohlmeister1 Received: 30 July 2020 / Accepted: 6 November 2020 © The Author(s) 2020

Abstract Due to growing globalized markets and the resulting globalization of production networks across different companies, inventory and order optimization is becoming increasingly important in the context of process chains. Thus, an adaptive and continuously self-optimizing inventory control on a global level is necessary to overcome the resulting challenges. Advances in sensor and communication technology allow companies to realize a global data exchange to achieve a holistic inventory control. Based on deep q-learning, a method for a self-optimizing inventory control is developed. Here, the decision process is based on an artificial neural network. Its input is modeled as a state vector that describes the current stocks and orders within the process chain. The output represents a control vector that controls orders for each individual station. Furthermore, a reward function, which is based on the resulting storage and late order costs, is implemented for simulations-based decision optimization. One of the main challenges of implementing deep q-learning is the hyperparameter optimization for the training process, which is investigated in this paper. The results show a significant sensitivity for the leaning rate α and the exploration rate ε. Based on optimized hyperparameters, the potential of the developed methodology could be shown by significantly reducing the total costs compared to the initial state and by achieving stable control behavior for a process chain containing up to 10 stations. Keywords Inventory control · Deep q-learning · Process chain · Self-optimizing control · Learning parameters List of symbols 𝛼 Learning rate 𝛼d Decay of the learning rate arandom With possibility ε randomly chosen action at time t at Action at time t chosen by the agent Clate Late order costs Cs,n Storage costs of station n cvdim Dimension of the control vector/number of the neurons of the output layer emax Number of training episodes per training 𝜀 Exploration rate 𝜀d Decay of the learning rate fc Total cost factor flate Late order cost factor fsc Storage cost factor 𝛾 Discount factor * S. Fohlmeister [email protected]‑hannover.de 1

Institute of Production Engineering and Machine Tools, An der Universität 2, 30823 Garbsen, Germany

id(…) Identity function m Number of simulated time steps per training episode MAE Mean absolute error MDP Markov decision process n Number of intermediate stations nwarm−up Number of warm-up episodes olate,n Number of late orders at station n on Order placed by station n oopen,n Number of open orders within a simulated day at station n oq Interval of the order quantity ot( ) Interval of the order distance Q st , at Quality of an action during a state s at time t ReLU Rectifier linear unit rt Reward at time t sdis,n Disposable stock at statio

Data Loading...

A deep q-learning-based optimization of the inventory control in a linear process chain

Recommend Documents

Robust Feedback Control for a Linear Chain of Oscillators

Inventory-Production Theory A Linear Policy Approach

A Practical Review of Inventory Models in Supply Chain: A Special Focus in Agribusiness

Inventory Control

INVENTORY CONTROL

WORK-IN-PROCESS (WIP) INVENTORY

Sustainable construction supply chain management with the spotlight of inventory optimization under uncertainty

Control and Optimization of Multiscale Process Systems

Optimization and Inventory Management

Process Optimization A Statistical Approach

Nonlinear Dynamics of a Linear Dust Particle Chain

Assessing a Modeling Process of a Linear Pattern Task