Predicting the Olea pollen concentration with a machine learning algorithm ensemble

  • PDF / 1,376,075 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 45 Downloads / 138 Views

DOWNLOAD

REPORT


ORIGINAL PAPER

Predicting the Olea pollen concentration with a machine learning algorithm ensemble José María Cordero 1

&

J. Rojo 2 & A. Montserrat Gutiérrez-Bustillo 3 & Adolfo Narros 1 & Rafael Borge 1

Received: 11 May 2020 / Revised: 14 September 2020 / Accepted: 31 October 2020 # ISB 2020

Abstract Air pollution in large cities produces numerous diseases and even millions of deaths annually according to the World Health Organization. Pollen exposure is related to allergic diseases, which makes its prediction a valuable tool to assess the risk level to aeroallergens. However, airborne pollen concentrations are difficult to predict due to the inherent complexity of the relationships among both biotic and environmental variables. In this work, a stochastic approach based on supervised machine learning algorithms was performed to forecast the daily Olea pollen concentrations in the Community of Madrid, central Spain, from 1993 to 2018. Firstly, individual Light Gradient Boosting Machine (LightGBM) and artificial neural network (ANN) models were applied to predict the day of the year (DOY) when the peak of the pollen season occurs, resulting the estimated average peak date 149.1 ± 9.3 and 150.1 ± 10.8 DOY for LightGBM and ANN, respectively, close to the observed value (148.8 ± 9.8). Secondly, the daily pollen concentrations during the entire pollen season have been calculated using an ensemble of two-step GAM followed by LightGBM and ANN. The results of the prediction of daily pollen concentrations showed a coefficient of determination (r2) above 0.75 (goodness of the model following cross-validation). The predictors included in the ensemble models were meteorological variables, phenological metrics, specific site-characteristics, and preceding pollen concentrations. The models are state-of-the-art in machine learning and their potential has been shown to be used and deployed to understand and to predict the pollen risk levels during the main olive pollen season. Keywords Air quality . Pollen exposure . Pollen prediction . Neural networks . Boosted trees

Introduction Poor air quality is associated to mortality and morbidity through respiratory causes, including cardiovascular diseases and lung cancer (Burnett et al. 2018; Cole-Hunter et al. 2018). According to the World Health Organization (WHO), 4.2 million premature deaths worldwide every year can be attributed to air pollution (“World Health Organization”, 2019). Moreover, the

* José María Cordero [email protected] 1

Universidad Politécnica de Madrid (UPM). ETSII-UPM, José Gutiérrez Abascal 2, 28006 Madrid, Spain

2

University of Castilla-La Mancha. Institute of Environmental Sciences (Botany), Avda. Carlos III s/n, E-45071 Toledo, Spain

3

Department of Pharmacology, Pharmacognosy and Botany, Complutense University of Madrid, Ciudad Universitaria, 28040 Madrid, Spain

contribution of air pollution to premature mortality could double by 2050 (Lelieveld et al. 2015). Over 500,000 premature annual deaths are attributed to population exposure to PM2.5, NO2,