Exomiser configuration¶
This is a complete description of the configuration used for Exomiser in the Rare Disease GMS. A summary of the general configurations applied is included below:
- Only variants with a "PASS" status are included
- Population frequency cut-offs are applied
- Predicted pathogenicity tools are enabled and this contributes to the score/rank
- Inheritance filters are enabled and this contributes to the score/rank
- The OMIM prioritiser is enabled and this contributes to the score/rank
Population allele frequencies¶
The table below defines allele frequency thresholds by each inheritance pattern considered by Exomiser.
Inheritance pattern | Frequency threshold |
---|---|
AUTOSOMAL_DOMINANT | 0.1 |
AUTOSOMAL_RECESSIVE_COMP_HET | 2.0 |
AUTOSOMAL_RECESSIVE_HOM_ALT | 0.1 |
X_DOMINANT | 0.1 |
X_RECESSIVE_COMP_HET | 2.0 |
X_RECESSIVE_HOM_ALT | 0.1 |
MITOCHONDRIAL | 0.2 |
The allele frequency cutoff is compared against the following populations.
Source | Sub-Population |
---|---|
ESP | ESP_AFRICAN_AMERICAN ESP_EUROPEAN_AMERICAN ESP_ALL |
ExAC | EXAC_AFRICAN_INC_AFRICAN_AMERICAN EXAC_AMERICAN EXAC_EAST_ASIAN EXAC_FINNISH EXAC_NON_FINNISH_EUROPEAN EXAC_SOUTH_ASIAN EXAC_OTHER |
gnomAD exomes | GNOMAD_E_AFR GNOMAD_E_AMR GNOMAD_E_EAS GNOMAD_E_FIN GNOMAD_E_NFE GNOMAD_E_OTH GNOMAD_E_SAS |
gnomAD genomes | GNOMAD_G_AFR GNOMAD_G_AMR GNOMAD_G_EAS GNOMAD_G_FIN GNOMAD_G_NFE GNOMAD_G_OTH GNOMAD_G_SAS |
Others | THOUSAND_GENOMES UK10K TOPMED |
Additionally an internal Genomics England allele frequencies dataset is used (see Exomiser database versions)
Phenotype scoring algorithms¶
The HiPhive algorithm is configured to use human, mouse, fish organism data and to include protein-protein interaction proximities in phenotype scores.
Variant scoring algorithms¶
REVEL and MVP are configured as “pathogenicitySources”.
Note
REVEL is an ensemble method that includes data from Polyphen.
Variant consequences¶
A full list of possible variant consequences is available here
The following variant consequences are filtered out, and not considered.
Region | Specific consequence |
---|---|
Untranslated region (UTR) | FIVE_PRIME_UTR_EXON_VARIANT FIVE_PRIME_UTR_INTRON_VARIANT THREE_PRIME_UTR_EXON_VARIANT THREE_PRIME_UTR_INTRON_VARIANT |
Transcript | NON_CODING_TRANSCRIPT_EXON_VARIANT NON_CODING_TRANSCRIPT_INTRON_VARIANT CODING_TRANSCRIPT_INTRON_VARIANT |
Intergenic | UPSTREAM_GENE_VARIANT INTERGENIC_VARIANT REGULATORY_REGION_VARIANT |
Short tandem repeat expansion maskings¶
The following STR loci showed a large number of artifacts caused by the variability between individuals. As these will be better handled by our dedicated STR caller we have excluded these regions from analysis in Exomiser:
- chrX:147912048-147912058 (FMR1)
- chr12:6936727-6936737 (ATN1)
- chr14:92071009-92071011 (ATXN3)
- chr20:46022942-46022952 (SLC12A5)