Skip to content

Exomiser implementation in Genomics England Rare Disease pipeline

Modes of Inheritance

In the Genomics England Rare Disease pipeline, Exomiser is configured to remove low-quality (i.e. non PASS) and non-coding variants (that are not in the variant inclusion list) and then for each of the modes of inheritance (MOI) being considered:

  • autosomal dominant
  • autosomal recessive
  • x-linked dominant
  • x-linked recessive
  • mitochondrial

Population frequencies

Variants compatible with the MOI are retained if below a minor allele frequency of 0.1% (or 2% for compound heterozygotes, 0.2% for mitochondrial variants) in all of the following reference databases:

For exact Exomiser configurations and database versions see Exomiser configuration.

Variant score calculation

Exomiser then calculates a score for how rare and deleterious each variant is (on a scale of 0 to 1) using the above frequency sources and predicted deleteriousness scores by Polyphen2, SIFT and MutationTaster from dbNSFP.

For each MOI, the highest scoring compatible variant for each gene, or top two highest for compound-heterozygous candidates, are then selected as the contributing variant(s) for that gene under that MOI and used to assign a gene-level variant score (taking the mean for compound heterozygotes).

Additionally, a variant inclusion list is configured that is based on data from ClinVar, but extended with pathogenic/likely pathogenic variants from GMS data. Exomiser will consider any variant on the inclusion list to be maximally deleterious (i.e. score of 1), regardless of additional annotation (e.g. variant effect, allele frequency, predicted deleteriousness). This means that in some cases Exomiser results will contain variants that would be otherwise excluded from consideration, e.g. non-coding variants.

Phenotype score calculation

In parallel, Exomiser produces a phenotype score for each gene (on a scale of 0 to 1) based on how phenotypically similar the patient’s phenotypes are to:

  • OMIM and Orphanet rare diseases known to be associated with the gene,
  • mouse and zebrafish models associated with the orthologue of the gene,
  • disease, mouse or zebrafish phenotypes associated with neighbouring genes in the StringDB protein-protein association database (scores weighted down based on network distance from the gene under consideration).

This scoring makes use of the OWLSim algorithm to semantically compare phenotypes such that similar but non-exact phenotypes can be identified and weighted according to how distant the two terms are in the ontology as well as how frequently the phenotype in common is observed. The highest score from these comparisons is assigned as the gene-level phenotype score.

Overall Exomiser score

Finally, a logistic regression model is used to combine the phenotype and variant scores and produce an overall Exomiser score for each gene and its contributing variants for each compatible MOI (scaled from 0 to 1). Variants are ranked based on their overall Exomiser score, with the highest ranked (rank = 1) variant(s) representing the most-likely disease-causing candidate according to Exomiser.

Note

A particular variant can be identified as contributing under a dominant MOI as well as a recessive MOI as a compound heterozygote, and in this scenario will receive two different Exomiser scores. In this scenario, each MOI-specific score is returned as a separate reportEvent for that variant. The maximum Exomiser score out of any of the reportEvents for a variant is used to rank all of the returned variants.