Skip to content

Limitations of the Rare Disease bioinformatics pipeline

A summary of the limitations of the Rare Disease bioinformatics pipeline is provided in the Summary of Findings available in the Interpretation Portal.

The rare disease bioinformatics pipeline has been through several iterations of innovation. As a result, the features and approaches included in the rare disease bioinformatics pipeline can differ between major NGIS releases. Examples of innovations include: * detection of CNVs between 2 and 10Kb (released in NGIS Danny release, July 2021) * prioritisation of variants with known pathogenic/likely pathogenic status (released in NGIS Izar release for ClinVar variants, March 2023) * upgrade of the DRAGEN software used for read mapping and variant calling to DRAGEN v4.0.5 (released in NGIS Mira release, April 2024).

The specific NGIS release version that the Statement of Limitations text relates to is included in the first paragraph.

This sample was processed through the rare disease bioinformatics pipeline included in the NGIS Mira release. Additional details of the Rare Disease analytical pipeline are available in the Rare Disease Genome Analysis Guide (available online at https://pipeline-rd-help.genomicsengland.co.uk/Mira, and through the NHS Futures Website).

The variants described below were selected by the NHS Genomic Laboratory Hub following review of prioritised variants from the Genomics England interpretation (tiering) pipeline. It may include single nucleotide variants and small insertions/deletions in the virtual gene panel(s) classified as Tier 1 or 2 and/or other types of prioritised variants that may be of relevance to the patient's phenotype. The variants identified here were detected from whole genome sequencing data with a variant prioritisation process that focused on protein coding genes, selected non-coding genes, and loci in accordance with most currently diagnostic reportable genomic variation.

The single nucleotide variant (SNV), small insertion/deletion (indel) and copy number variant (CNV) tiering has been carried out based on the clinical indication and pedigree data as given in the referral. It is the responsibility of the reporting laboratory to check that this information is correct before issuing a clinical report.

Tiered SNVs and indels are rare variants that segregate with disease under the penetrance mode defined in the referral.

Tier 1 includes rare variants in the applied virtual gene panel that are:

  • high impact variants
    • predicted consequence types: stop-gain, stop-loss, start-loss, splice donor/acceptor, frameshift, transcript ablation
  • de novo variants predicted to be of functional consequence
    • only applies to genes associated with a phenotype with monoallelic mode of inheritance
  • ClinVar and/or Clinical Variant Ark (CVA) variants with at least one pathogenic or likely pathogenic assertion in genes in the virtual gene panel(s) applied for the patient
    • variants with the same protein change as a ClinVar variant or a CVA variant are also included for consideration* - this amino acid matching approach only applies to variants that were classified as pathogenic/likely pathogenic in CVA from the NGIS Izar release (March 2023) onwards

Tier 2 includes rare variants in the applied virtual gene panel that are:

  • moderate impact variants
    • predicted consequence types in protein-coding genes: missense, splice region variant (+/- 8bp from the nearest exon), in-frame insertion/deletion, transcript amplification, incomplete terminator codon
    • predicted consequence types in non-coding genes: non-coding transcript exon variant

Tier 3 includes rare variants outside of the applied virtual gene panel that are:

  • high impact variants
    • predicted consequence types: stop-gain, stop-loss, start-loss, splice donor/acceptor, frameshift, transcript ablation consequence types
  • moderate impact variants in protein-coding genes
    • predicted consequence types in protein-coding genes: missense, splice region variant (+/- 8bp from the nearest exon), in-frame insertion/deletion, transcript amplification, incomplete terminator codon
  • ClinVar and/or CVA variants with at least one pathogenic or likely pathogenic assertion in genes in the virtual gene panel(s) applied for the patient
    • variants with the same protein change as a ClinVar variant or a CVA variant are also included for consideration* – this amino acid matching approach only applies to variants that were classified as pathogenic/likely pathogenic in CVA from the NGIS Izar release (March 2023) onwards

*It is recommended that HGVSp. automated predictions are verified during the reporting process, particularly for variants with complex nomenclature.

Tiered CNVs are high quality calls >2 kb derived from the proband only. Tier A includes CNVs that overlap with genes or contain regions defined in the virtual gene panel(s) applied to the patient. Tier null includes high quality CNV calls >2 kb that neither overlap with genes nor regions defined in the virtual gene panel(s).

Short Tandem Repeats (STRs) are only included in the prioritised variants for specific loci defined in the virtual gene panel(s) applied to the patient.

It is possible that disease-causing variant(s) were not detected, for example because they are in a region of low coverage, low mappability, or poor sequence quality, they are of a type that could not be detected, they have a lower than expected allelic balance due to mosaicism or they are mitochondrial variants with very low heteroplasmy levels. Variants may also not be included in this list of prioritised variants if the variant falls outside of the virtual gene panels applied, has a consequence type that is not prioritised, a population allele frequency above the threshold applied, a segregation pattern not considered or not in accordance with the mode of inheritance for pathogenic variants attributed to the relevant gene or entity, or the segregation pattern in the family is not as expected (for example, incomplete penetrance was not anticipated). In some cases, biallelic STR expansions may be detected as monoallelic expansions and may not be included in the list of prioritised variants where the genotype is not in accordance with the anticipated mode of inheritance attributed to the STR. In these cases, the expansion will be reported as Tier null. For longer expansions where the length of the repeat is longer than the WGS read length (150bp or 50 repeats for a triplet repeat), pre-expansions may not be reliably distinguished from full expansions. In such cases, the full expansion length may be underestimated and reported in either Tier 1 or Tier 2. Please note that CNVs <2 kb and structural variants are not currently reported. All GENCODE Basic transcripts (Ensembl version 90, GRCh38) associated with specified biological significance categories are considered in the tiering algorithm. Further diagnostic or research analysis may lead to updated prioritised variants being issued in the future.

The estimated sensitivity and precision of the Rare Disease pipeline 2.0 for variant detection (not including tiering) of small variants, CNVs and STR expansions are summarised below. Estimates may be revised as availability of appropriate data for validation improves.

Mitochondrial DNA variants were not included in calculation of sensitivity and precision and are outside the scope of ISO 15189 accreditation for the pipeline. The measurement of uncertainty and the limit of detection of heteroplasmic mitochondrial DNA variants was not determined.

Pipeline sensitivity and precision metrics