Protein Recommendations

Substitution Variant


Definitions

Substitution
a sequence change where, compared to a reference sequence, one amino acid is replaced by one other amino acid.

Description

Format: “prefix”“amino_acid”“position”“new_amino_acid”, e.g. p.(Arg54Ser)

“prefix” = reference sequence used = p.
“amino_acid” = reference amino acid = Arg
“position” = position amino acid subtituted = 54
“new_amino_acid” = new amino acid = Ser


Note

  • all variants should be described at the DNA level, descriptions at the RNA and protein level may be given in addition
  • prefix reference sequence accepted is “p.” (protein)
  • predicted consequences, i.e. without experimental evidence (no RNA or protein sequence analysed), should be given in parentheses, e.g. p.(Arg727Ser)
  • a nonsense variant, a variant changing an amino acid to a translation termination (stop) codon, is described as a substitution
    NOTE: a nonsense variant is not described as a Deletion of the C-terminal end of the protein (e.g. p.Trp26_Arg1623del)
    • variants which introduce an immediate translation termination (stop) codon are described as nonsense variant
    • NOTE: not p.Tyr4TerfsTer1 but p.Tyr4Ter (or p.Tyr4), not p.Tyr4_Cys5insTerGluAsp but p.Tyr4Ter (or p.Tyr4), not p.Cys5_Ser6delinsTerGluAsp but p.Tyr4Ter (or p.Tyr4*)
  • a no-stop variant, a variant changing the translation termination codon into an amino acid codon, is described as a extension (Extension)
  • changes involving two or more consecutive amino acids are described as a deletion/insertion variant (delins) (see Deletion/insertion (delins))
    • the description p.Arg76_Cys77delinsSerTrp is correct, the description p.[Arg76Ser;Cys77Trp] is not correct
  • amino acids that have been tested and found not changed (silent) are described as p.Cys123= (see SVD-WG001 (no change))

Examples

  • missense
    • LRG_199p1:p.Trp24Cys
      amino acid Trp24 is changed to a Cys
    • NP_003997.1:p.(Trp24Cys)
      amino acid Trp24is predicted to change to a Cys (no experimental proof, e.g. based on DNA level data)
  • nonsense
    • LRG_199p1:p.Trp24Ter (p.Trp24*)
      amino acid Trp24 is changed to a stop codon (Ter, *)
      NOTE: this change is not described as a deletion of the C-terminal end of the protein (i.e. p.Trp24_Met36853del)
  • silent (no change)
    • NP_003997.1:p.Cys188=
      amino acid Cys188 is not changed (DNA level change ..TGC.. to ..TGT..)
      NOTE: the description p.= means the entire protein coding region was analysed and no variant was found that changes (or is predicted to change) the protein sequence.
  • translation initiation codon
    • no protein
      LRG_199p1:p.0
      as a consequence of a variant in the translation initiation codon no protein is produced
      NOTE: LRG_199p1:p.0? can be used when you predict that no protein is produced. Do not use descriptions like “p.Met1Thr”, this is for sure not the consequence of the effect on protein translation
    • unknown
      LRG_199p1:p.(Met1?)
      the consequence, at the protein level, of a variant affecting the translation initiation codon can not be predicted (i.e. is unknown)
    • new translation initiation site
      • downstream - NP_003997.1:p.Leu2_Met124del (deletion)
        a variant in the translation initiation codon causes the activation of a downstream translation initiation site (Met) resulting in deletion of the first 123 amino acids (Met-1 to Val-123) of the protein.
        NOTE: the 3’ rule applies.
      • upstream - p.Met1_Leu2insArgSerThrVal (insertion)
        a variant in the translation initiation codon (Met1) changes it to a Valine (Val) and activates an upstream translation initiation site at position -4, replacing amino acid Met1 with MetArgSerThrVal. Applying the 3’ rule the variant is described as an insertion.
        NOTE: this variant is not described as an extension
      • new - p.Met1ext-5 (extension)
        a variant in the 5’ UTR activates a new in-frame upstream translation initiation site starting with amino acid Met-5 (see Extension)
  • translation termination codon (stop codon, no-stop change)
    see Extension
  • splicing
    • NP_003997.1:p.?
      the predicted consequence of variant NM_004006.2c.2622G>C is a silent change (p.(Lys874=)). Since it affects the last nucleotide of the exon it can not be excluded the variant affects splicing, having unknown consequences.
      NOTE: when others have reported the same variant, and were able to analyse RNA, you could consider to give the consequences they observed as the predicted consequences for the variant, e.g. r.[(2603_2622del,2622g>c)] p.[(Ser868Argfs*2,Ser868=)]
  • uncertain
    • NP_003997.1:p.(Gly56Ala^Ser^Cys)
      amino acid Gly56 is changed to an Ala, Ser or Cys (see Uncertain)
  • mosaic
    • LRG_199p1:p.Trp24=/Cys
      a mosaic case where at amino acid position 24 besides the normal amino acid (a Trp, described as “=”) also protein is found containing a Cys (Trp24Cys)
      NOTE: irrespective of the frequency in which each amino acid was found, the reference is always described first
      NOTE: for the predicted consequences of a variant the description is LRG_199t1:p.(Trp24=/Cys)

Q&A

Are polymorphisms described like p.2366Gln/Lys?

No, all substitutions are described as NP_003997.1:p.Gln2366Lys. In the past, the format p.2366Gln/Lys (p.2366Q/K) has been used to describe "polymorphic" sequence variants. Note that a description should be neutral, simply describe the change, and not include any other information like predicted or known functional consequences.

Can I describe a TrpVal to CysArg variant as a amino acid substitution (p.TrpVal24CysArg)?

No, this is not allowed. By definition a substitution changes one amino acid into one other amino acid. The change TrpVal to CysArg should be described as NP_003997.1:p.Trp24_Val25delinsCysArg, i.e. a deletion/insertion (indel) (see Deletion-Insertion).

How should you describe an amino acid substitution to any other amino acid?

HGVS uses IUPAC symbols (see Standards). The symbol for 'any' amino acid is 'X'/'Xaa'. Since 'X' has been used to indicate a translation stop codon (nonsense variant) we suggest to use 'Xaa' three-letter amino acid code only (e.g. p.Arg782Xaa).