Multiple imputation and direct estimation for qPCR data with non-detects

  • PDF / 1,526,599 Bytes
  • 15 Pages / 595.276 x 790.866 pts Page_size
  • 47 Downloads / 199 Views

DOWNLOAD

REPORT


METHODOLOGY ARTICLE

Open Access

Multiple imputation and direct estimation for qPCR data with non‑detects Valeriia Sherina1, Helene R. McMurray2,3, Winslow Powers4, Harmut Land2, Tanzy M. T. Love1 and Matthew N. McCall1,2* *Correspondence: [email protected] 2 Department of Biomedical Genetics, University of Rochester Medical Center, 601 Elmwood Ave., 14642 Rochester, NY, USA Full list of author information is available at the end of the article

Abstract  Background:  Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. An important aspect of qPCR data that has been largely ignored is the presence of non-detects: reactions failing to exceed the quantification threshold and therefore lacking a measurement of expression. While most current software replaces these non-detects with a value representing the limit of detection, this introduces substantial bias in the estimation of both absolute and differential expression. Single imputation procedures, while an improvement on previously used methods, underestimate residual variance, which can lead to anti-conservative inference. Results:  We propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. We assess the proposed methods via simulation studies and demonstrate the applicability of these methods to three experimental data sets. We compare our methods to mean imputation, single imputation, and a penalized EM algorithm incorporating non-random missingness (PEMM). The developed methods are implemented in the R/Bioconductor package nondetects. Conclusions:  The statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments in the presence of non-detects, providing increased confidence in downstream analyses. Keywords:  Gene expression, Quantitative real-time PCR (qPCR), Missing not at random (MNAR), Non-detects, Direct estimation, Multiple imputation

Background Polymerase chain reaction (PCR) uses short-length oligonucleotide primers to initiate and direct synthesis of new DNA copies using DNA polymerase plus single-stranded DNA as a template [1]. Oligonucleotides complementary to each of the two possible sequences relating to the sense and anti-sense strands of the target DNA are included in the reaction, allowing both strands to be amplified simultaneously. These new DNA copies are added to the pool of DNA templates and the process is repeated multiple times, © The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com