Skip to content

Exomiser configuration

This is a complete description of the configuration used for Exomiser in the Rare Disease GMS. A summary of the general configurations applied is included below:

  • Only variants with a "PASS" status are included
  • Population frequency cut-offs are applied
  • Predicted pathogenicity tools are enabled and this contributes to the score/rank
  • Inheritance filters are enabled and this contributes to the score/rank
  • The OMIM prioritiser is enabled and this contributes to the score/rank

Population allele frequencies

The table below defines allele frequency thresholds by each inheritance pattern considered by Exomiser.

Inheritance pattern Frequency threshold
AUTOSOMAL_DOMINANT 0.1
AUTOSOMAL_RECESSIVE_COMP_HET 2.0
AUTOSOMAL_RECESSIVE_HOM_ALT 0.1
X_DOMINANT 0.1
X_RECESSIVE_COMP_HET 2.0
X_RECESSIVE_HOM_ALT 0.1
MITOCHONDRIAL 0.2

The allele frequency cutoff is compared against the following populations.

Source Sub-Population
ESP ESP_AFRICAN_AMERICAN
ESP_EUROPEAN_AMERICAN
ESP_ALL
ExAC EXAC_AFRICAN_INC_AFRICAN_AMERICAN
EXAC_AMERICAN
EXAC_EAST_ASIAN
EXAC_FINNISH
EXAC_NON_FINNISH_EUROPEAN
EXAC_SOUTH_ASIAN
EXAC_OTHER
gnomAD exomes GNOMAD_E_AFR
GNOMAD_E_AMR
GNOMAD_E_EAS
GNOMAD_E_FIN
GNOMAD_E_NFE
GNOMAD_E_OTH
GNOMAD_E_SAS
gnomAD genomes GNOMAD_G_AFR
GNOMAD_G_AMR
GNOMAD_G_EAS
GNOMAD_G_FIN
GNOMAD_G_NFE
GNOMAD_G_OTH
GNOMAD_G_SAS
Others THOUSAND_GENOMES
UK10K
TOPMED

Additionally an internal Genomics England allele frequencies dataset is used (see Exomiser database versions)

Phenotype scoring algorithms

The HiPhive algorithm is configured to use human, mouse, fish organism data and to include protein-protein interaction proximities in phenotype scores.

Variant scoring algorithms

REVEL and MVP are configured as “pathogenicitySources”.

Note

REVEL is an ensemble method that includes data from Polyphen.

Variant consequences

A full list of possible variant consequences is available here

The following variant consequences are filtered out, and not considered.

Region Specific consequence
Untranslated region (UTR) FIVE_PRIME_UTR_EXON_VARIANT
FIVE_PRIME_UTR_INTRON_VARIANT
THREE_PRIME_UTR_EXON_VARIANT
THREE_PRIME_UTR_INTRON_VARIANT
Transcript NON_CODING_TRANSCRIPT_EXON_VARIANT
NON_CODING_TRANSCRIPT_INTRON_VARIANT
CODING_TRANSCRIPT_INTRON_VARIANT
Intergenic UPSTREAM_GENE_VARIANT
INTERGENIC_VARIANT
REGULATORY_REGION_VARIANT

Short tandem repeat expansion maskings

The following STR loci showed a large number of artifacts caused by the variability between individuals. As these will be better handled by our dedicated STR caller we have excluded these regions from analysis in Exomiser:

  • chrX:147912048-147912058 (FMR1)
  • chr12:6936727-6936737 (ATN1)
  • chr14:92071009-92071011 (ATXN3)
  • chr20:46022942-46022952 (SLC12A5)