Numbering


genomic reference sequences


coding DNA reference sequences

nucleotide numbering is based on the annotated protein isoform, the major translation product.

Initial recommendations (Antonarakis (1998) and Den Dunnen & Antonarakis (2000)) suggested two alternative descriptions for intronic variants; c.88+2T>G / c.89-1G>T and c.IVS2+2T>G / c.IVS2-1G>T. The format c.IVS2+2T>G / c.IVS2-1G>T has been retracted and should not be used.


non-coding DNA reference sequences


RNA reference sequences

nucleotide numbering for a RNA reference sequence follows that of the associated coding or non-coding DNA reference sequence; nucleotide r.123 relates to c.123 or n.123.


protein reference sequences

amino acid numbering is p.1, p.2, p.3, …, etc. from the first to the last amino acid of the reference sequence


Q&A


Figure

Reference Sequence Figure


Examples

The basic recommendation is that the reference sequence used represents the major and largest transcript of the gene. The MANE Select transcript available from the MANE project (see Ensembl or NCBI) is preferred if it is a suitable reference for describing the variant. Variants present in alternative transcripts, not covered by the selected reference transcript, can be described based on annotated alternative transcript variants (e.g. NM_001099404.2, LRG_199t3) or protein isoforms (e.g. NP_001092874.1, LRG_199p3), preferring MANE Plus Clinical transcripts if specified by the MANE project. Contact the MANE project (MANE-help@ncbi.nlm.nih.gov or mane-help@ebi.ac.uk) if any pathogenic variants cannot be adequately described using available MANE transcripts.