Proteotranscriptomics assisted gene annotation and spatial proteomics of Bombyx mori BmN4 cell line

  • PDF / 2,532,528 Bytes
  • 14 Pages / 595.276 x 790.866 pts Page_size
  • 54 Downloads / 236 Views

DOWNLOAD

REPORT


RESEARCH ARTICLE

Open Access

Proteotranscriptomics assisted gene annotation and spatial proteomics of Bombyx mori BmN4 cell line Michal Levin* , Marion Scheibe and Falk Butter*

Abstract Background: The process of identifying all coding regions in a genome is crucial for any study at the level of molecular biology, ranging from single-gene cloning to genome-wide measurements using RNA-seq or mass spectrometry. While satisfactory annotation has been made feasible for well-studied model organisms through great efforts of big consortia, for most systems this kind of data is either absent or not adequately precise. Results: Combining in-depth transcriptome sequencing and high resolution mass spectrometry, we here use proteotranscriptomics to improve gene annotation of protein-coding genes in the Bombyx mori cell line BmN4 which is an increasingly used tool for the analysis of piRNA biogenesis and function. Using this approach we provide the exact coding sequence and evidence for more than 6200 genes on the protein level. Furthermore using spatial proteomics, we establish the subcellular localization of thousands of these proteins. We show that our approach outperforms current Bombyx mori annotation attempts in terms of accuracy and coverage. Conclusions: We show that proteotranscriptomics is an efficient, cost-effective and accurate approach to improve previous annotations or generate new gene models. As this technique is based on de-novo transcriptome assembly, it provides the possibility to study any species also in the absence of genome sequence information for which proteogenomics would be impossible. Keywords: Proteotranscriptomics, Mass spectrometry, Gene assembly, Gene annotation, Spatial proteomics

Background Bombyx mori was the first lepidopteran species whose draft genome was published in 2004 [1, 2]. In 2008, a more accurate genome assembly was generated by combining the raw data of these initial efforts within an international collaboration [3], and the results are available at SilkDB (www.silkdb.org) and KAIKObase (sgp.dna.affrc. go.jp/index.html). However, for a large number of modern applications such as transcriptomic, epigenomic and proteomic studies, reverse genetic screens and genome editing tools such as TALEN and CRISPR/Cas9 the * Correspondence: [email protected]; [email protected] Institute of Molecular Biology (IMB), Ackermannweg 4, 55128 Mainz, Germany

provided genome information is insufficient as this assembly contains numerous non-sequenced chromosome regions. Recently, parallel to our efforts to reannotate Bombyx mori using proteotranscriptomics, two new initiatives provided improved genome assemblies. These new assemblies have been made available as SilkBase [4] and SilkDB 3.0 [5] and include more genomic regions and gene predictions for 16,880 and 16,069 gene models, receptively. However, the provided gene models are still based on automated gene prediction using limited fulllength cDNA libraries, poly-A RNA-seq data and previous B. mori NCBI annotations. These predictions are made with a