Machine learning to predict retention time of small molecules in nano-HPLC

PDF / 1,515,732 Bytes
10 Pages / 595.276 x 790.866 pts Page_size
99 Downloads / 207 Views

RESEARCH PAPER

Machine learning to predict retention time of small molecules in nano-HPLC Sergey Osipenko 1 & Inga Bashkirova 1 & Sergey Sosnin 1 & Oxana Kovaleva 1 & Maxim Fedorov 1 & Eugene Nikolaev 1 & Yury Kostyukevich 1 Received: 26 May 2020 / Revised: 29 July 2020 / Accepted: 20 August 2020 # Springer-Verlag GmbH Germany, part of Springer Nature 2020

Abstract Retention time is an important parameter for identification in untargeted LC-MS screening. Precise retention time prediction facilitates the annotation process and is well known for proteomics. However, the lack of available experimental information for a long time has limited the prediction accuracy for small molecules. Recently introduced large databases for small-molecule retention times make possible reliable machine learning–based predictions for the whole diversity of compounds. Applying simple projections may expand these predictions on various LC systems and conditions. In our work, we describe a complex approach to predict retention times for nano-HPLC that includes the consequent deployment of binary and regression gradient boosting models trained on the METLIN small-molecule dataset and simple projection of the results with a small number of easily available compounds onto nano-HPLC separations. The proposed model outperforms previous attempts to use machine learning for predictions with a 46-s mean absolute error. The overall performance after transfer to nano-LC conditions is less than 155 s (10.8%) in terms of the median absolute (relative) error. To illustrate the applicability of the described approach, we successfully managed to eliminate averagely 25 to 42% of false-positives with a filter threshold derived from ROC curves. Thus, the proposed approach should be used in addition to other well-established in silico methods and their integration may broaden the range of correctly identified molecules. Keywords Retention time prediction . Nano-HPLC . Machine learning

Introduction Untargeted screening of small molecules based on liquid chromatography coupled with mass spectrometry (LC-MS) has become a common practice in forensic analysis [1], doping control [2], drug discovery [3], medicine [4], food [5], and environmental chemistry [6]. The bottleneck of all untargeted approaches is a compound annotation that is mainly based on Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00216-020-02905-0) contains supplementary material, which is available to authorized users. * Eugene Nikolaev [email protected] * Yury Kostyukevich [email protected] 1

Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Nobel Str., 3, 121205 Moscow, Russia

matching fragmented mass spectra to publicly available databases [7]. It significantly reduces the number of candidates obtained after accurate mass search; however, the fragmentation pattern depends on a certain instrument and collision energy settings and may result in a high ratio of fals

Data Loading...

Machine learning to predict retention time of small molecules in nano-HPLC

Recommend Documents

Machine Learning to Predict the Martensite Start Temperature in Steels

Personalized machine learning approach to predict candidemia in medical wards

Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer

A Novel Hybrid Machine Learning Model to Predict Diabetes Mellitus

Smart Parking System to Predict Occupancy Rates Using Machine Learning

An Analytical Review on Machine Learning Techniques to Predict Diseases

Small Molecules in Oncology

Using machine learning methods to predict in-hospital mortality of sepsis patients in the ICU

Learning Retention

Leveraging Machine Learning in IoT to Predict the Trustworthiness of Mobile Crowd Sensing Data

Introduction to Machine Learning

Integrating Machine Learning and Tumor Immune Signature to Predict Oncologic Outcomes in Resected Biliary Tract Cancer