Flexible Sliding Windows for Kernel Regression Based Bus Arrival Time Prediction

Given a set of historical bus trajectories D and a partially observed bus trajectory S up to position l on the bus route, kernel regression (KR) is a non-parametric approach which predicts the arrival time of the bus at location \(l+h\) (\(h>0\) ) by a

  • PDF / 1,528,789 Bytes
  • 17 Pages / 439.37 x 666.142 pts Page_size
  • 46 Downloads / 200 Views

DOWNLOAD

REPORT


Abstract. Given a set of historical bus trajectories D and a partially observed bus trajectory S up to position l on the bus route, kernel regression (KR) is a non-parametric approach which predicts the arrival time of the bus at location l+h (h > 0) by averaging the arrival times observed at same location in the past. The KR method does not weights the historical data equally but it gives more preference to the more similar trajectories in the historical data. This method has been shown to outperform the baseline methods such as linear regression or k-nearest neighbour algorithms for bus arrival time prediction problems [9]. However, the performance of the KR approach is very sensitive to the method of evaluating similarity between trajectories. General kernel regression algorithm looks back to the entire trajectory for evaluating similarity. In the case of bus arrival time prediction, this approach does not work well when outdated part of the trajectories does not reflect the most recent behaviour of the buses. In order to solve this issue, we propose an approach that considers only recent part of the trajectories in a sliding window for evaluating the similarity between them. The approach introduces a set of parameters corresponding to the window lengths at every position along the bus route determining how long we should look back into the past for evaluating the similarity between trajectories. These parameters are automatically learned from training data. Nevertheless, parameter learning is a time-consuming process given large training data (at least quadratic in the training size). Therefore, we proposed an approximation algorithm with guarantees on error bounds to learn the parameters efficiently. The approximation algorithm is an order of magnitude faster than the exact algorithm. In an experiment with a real-world application deployed for Dublin city, our approach significantly reduced the prediction error compared to the state of the art kernel regression algorithm.

1

Introduction

Recently, bus arrival time prediction using GPS data has become an important problem attracting many researchers from both academia and industry labs [9,11]. Beside having important application in urban transportation management systems, GPS data providing accurate real-time locations of buses is very cheap and easy to collect. In such system, GPS devices equipped on-board continuously update real-time locations of the buses in a reasonable fine-grained c Springer International Publishing Switzerland 2015  A. Bifet et al. (Eds.): ECML PKDD 2015, Part III, LNAI 9286, pp. 68–84, 2015. DOI: 10.1007/978-3-319-23461-8 5

Flexible Sliding Windows for Kernel Regression

69

time resolution (from seconds to minutes). Updated locations of the buses are used to predict bus arrival time at any location of the trajectory. Prediction can be used to provide urban citizens with real-time information about bus arrival time at any bus stop or to estimate the likelihood of bus bunching from which bus operators can direct bus drivers to avoid tho