A short guide to long non-coding RNA gene nomenclature

  • PDF / 196,848 Bytes
  • 4 Pages / 595.28 x 793.7 pts Page_size
  • 80 Downloads / 189 Views

DOWNLOAD

REPORT


LETTER TO THE EDITOR

Open Access

A short guide to long non-coding RNA gene nomenclature Mathew W Wright

Abstract The HUGO Gene Nomenclature Committee (HGNC) is the only organisation authorised to assign standardised nomenclature to human genes. Of the 38,000 approved gene symbols in our database (www.genenames.org), the majority represent protein-coding (pc) genes; however, we also name pseudogenes, phenotypic loci, some genomic features, and to date have named more than 8,500 human non-protein coding RNA (ncRNA) genes and ncRNA pseudogenes. We have already established unique names for most of the small ncRNA genes by working with experts for each class. Small ncRNAs can be defined into their respective classes by their shared homology and common function. In contrast, long non-coding RNA (lncRNA) genes represent a disparate set of loci related only by their size, more than 200 bases in length, share no conserved sequence homology, and have variable functions. As with pc genes, wherever possible, lncRNAs are named based on the known function of their product; a short guide is presented herein to help authors when developing novel gene symbols for lncRNAs with characterised function. Researchers must contact the HGNC with their suggestions prior to publication, to check whether the proposed gene symbol can be approved. Although thousands of lncRNAs have been predicted in the human genome, for the vast majority their function remains unresolved. lncRNA genes with no known function are named based on their genomic context. Working with lncRNA researchers, the HGNC aims to provide unique and, wherever possible, meaningful gene symbols to all lncRNA genes. Keywords: Long non-coding RNA, Nomenclature, ncRNA, lncRNA

Introduction Since its inception in the 1970s, the HUGO Gene Nomenclature Committee (HGNC) [1] has kept apace with the discovery and characterisation of new human genes, providing each gene with a unique symbol and name and thus aiding effective scientific communication. By the time the initial sequence of the Human Genome was published in 2001 [2], the HGNC database (www.genenames.org) [3] contained more than 13,000 approved gene names, mostly for protein-coding genes with only around 200 non-coding RNA (ncRNA) gene names. With the burgeoning research and interest in ncRNAs over the last decade, the number of ncRNA loci with gene names has vastly expanded to more than 8,500 currently; about 2,000 of these represent long non-coding RNA (lncRNA) genes. Whereas classes of small ncRNAs can be defined by their shared homology and common function [4], in contrast, lncRNA genes are a Correspondence: [email protected] HUGO Gene Nomenclature Committee (HGNC), EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

disparate set of loci related only by their size (more than 200 bases in length), are non-homologous, and have variable functions [5]. Their discovery has been further complicated because they are expressed at very low levels, sometimes only at specific developmental stages, and in spec