Evaluation of incomplete maternal smoking data using machine learning algorithms: a study from the Medical Birth Registr

  • PDF / 965,888 Bytes
  • 11 Pages / 595.276 x 790.866 pts Page_size
  • 89 Downloads / 178 Views

DOWNLOAD

REPORT


(2020) 20:710

RESEARCH ARTICLE

Open Access

Evaluation of incomplete maternal smoking data using machine learning algorithms: a study from the Medical Birth Registry of Norway Liv Grøtvedt1* , Grace M. Egeland2,3, Liv Grimstvedt Kvalvik3,4 and Christian Madsen1

Abstract Background: The Medical Birth Registry of Norway (MBRN) provides national coverage of all births. While retrieval of most of the information in the birth records is mandatory, mothers may refrain to provide information on her smoking status. The proportion of women with unknown smoking status varied greatly over time, between hospitals, and by demographic groups. We investigated if incomplete data on smoking in the MBRN may have contributed to a biased smoking prevalence. Methods: In a study population of all 904,982 viable and singleton births during 1999–2014, we investigated main predictor variables influencing the unknown smoking status of the mothers’ using linear multivariable regression. Thereafter, we applied machine learning to predict annual smoking prevalence (95% CI) in the same group of unknown smoking status, assuming missing-not-at-random. Results: Overall, the proportion of women with unknown smoking status was 14.4%. Compared to the Nordic country region of origin, women from Europe outside the Nordic region had 15% (95% CI 12–17%) increased adjusted risk to have unknown smoking status. Correspondingly, the increased risks for women from Asia was 17% (95% CI 15–19%) and Africa 26% (95% CI 23–29%). The most important machine learning prediction variables regarding maternal smoking were education, ethnic background, marital status and birth weight. We estimated a change from the annual observed smoking prevalence among the women with known smoking status in the range of − 5.5 to 1.1% when combining observed and predicted smoking prevalence. Conclusion: The predicted total smoking prevalence was only marginally modified compared to the observed prevalence in the group with known smoking status. This implies that MBRN-data may be trusted for health surveillance and research. Keywords: Pregnancy, Smoking, Hospitals, Ethnic groups, Education, Birth weight, Machine learning, Informed consent

* Correspondence: [email protected] 1 Department of Health and Inequality, Norwegian Institute of Public Health, Sandakerveien 24c, Bygg B, 0473 Oslo, Norway Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and