Reference Sequences


a sequence file that is used as a reference to describe variants that are present in a sequence analysed.

NOTE: this section has been updated based on the accepted proposal SVD-WG008 (Reference Sequences).

A sequence variant is defined in the context of a reference sequence which must be referred to by means of a unique sequence identifier. Because a reference sequence defines the numbering system and default state of a sequence (e.g. coding transcript, non-coding transcript), accurately interpreting a sequence variant requires that both the reference sequence and its corresponding identifier are unchangeable.

Reference Sequence Types

Depending on the variants to be reported, different reference sequence files are used at the DNA, RNA or protein level. It is mandatory to indicate the type of reference sequence file using a prefix preceding the variant description. Approved reference sequence types are c., g., m., n., o., p. and r.:

DNA - genomic reference sequence (g.)

DNA - circular genomic reference sequence (o.)

DNA - mitochondrial reference sequence (m.)

DNA - coding DNA reference sequence (c.)

DNA - non-coding DNA reference sequence (n.)

RNA reference sequence (r.)

protein reference sequence (p.)


(1) an opaque identifier is one that acts only as a name for an object and that is not intended to be parsed for additional meaning. Contrast with a RefSeq identifier, for example, which conveys annotation level (N versus X), type (M, R, C, etc.), and version number. So, this comment is intended to tell implementers that they may not rely on parsing the identifier to decide how the implementation works

Why not? Two reasons: