Methods for diversity and overlap analysis in T-cell receptor populations

  • PDF / 544,171 Bytes
  • 30 Pages / 439.37 x 666.142 pts Page_size
  • 87 Downloads / 128 Views

DOWNLOAD

REPORT


Mathematical Biology

Methods for diversity and overlap analysis in T-cell receptor populations Grzegorz A. Rempala · Michal Seweryn

Received: 6 March 2012 / Revised: 29 August 2012 © Springer-Verlag 2012

Abstract The paper presents some novel approaches to the empirical analysis of diversity and similarity (overlap) in biological or ecological systems. The analysis is motivated by the molecular studies of highly diverse mammalian T-cell receptor (TCR) populations, and is related to the classical statistical problem of analyzing twoway contingency tables with missing cells and low cell counts. The new measures of diversity and overlap are proposed, based on the information-theoretic as well as geometric considerations, with the capacity to naturally up-weight or down-weight the rare and abundant population species. The consistent estimates are derived by applying the Good–Turing sample-coverage correction. In particular, novel consistent estimates of the Shannon entropy function and the Morisita–Horn index are provided. Data from TCR populations in mice are used to illustrate the empirical performance of the proposed methods vis a vis the existing alternatives. Keywords Contingency tables · Antigen receptors · Richness and diversity estimation · Renyi’s entropy · Renyi’s divergence Mathematics Subject Classification (2000)

62P10 · 92B05 · 94A17

This research was partially supported by US NIH grant R01CA-152158 (GAR, MS) and US NSF grant DMS-1106485 (GAR). G. A. Rempala (B) · M. Seweryn Department of Biostatistics and Cancer Research Center, Georgia Health Sciences University, Augusta, GA 30912, USA e-mail: [email protected] M. Seweryn Department of Mathematics and Computer Science, University of Lodz, Lodz, Poland e-mail: [email protected]; [email protected]

123

G. A. Rempala, M. Seweryn

1 Introduction The recent successes of the Panvax study (see, e.g. Mohebtash et al. 2011), have invigorated the scientific efforts to obtain a vertebrate cancer vaccine and, consequently, reignited the interest in systematic analysis of T-cell populations. In vertebrates, T-cell populations are typically analyzed in terms of their capacities to recognize the so-called antibody generating molecules or antigens. An antigen is a foreign molecule which, when introduced into the body of a vertebrate, triggers the antibody production by the immune system. This immune system response is initiated when T-cells recognize and respond to antigens via their T-cell receptors (TCRs). TCRs are heterodimer proteins with two chains: α and β in αβ T-cells and γ and δ in γ δ T-cells. The genes encoding these proteins are generated by the so-called V(D)J DNA recombination during thymic T-cell development. In this process, T-cell precursors randomly recombine different V, D, and J gene segments and assemble the mature gene encoding a TCR chain. By enumeration of all such possible recombinations alone, one concludes that there are 1018 distinct TCR chains in humans (Janeway 2005) and 1015 in mice (Davis and Bjorkman 1988