Overview¶
The Rare Disease Pipeline uses genome reference GRCh38. Sequencing read alignment to the genome reference including decoy contigs and alternate haplotypes (ALT contigs) is performed using the DRAGEN aligner, with graph-based ALT-aware mapping and variant calling to improve specificity. The graph-based method enables accuracy improvements in difficult-to-map regions and segmental duplications. For further details, see DRAGEN graph mapper
Alignments are stored in CRAM files which contain both mapped and unmapped reads.
Detection of small variants (single nucleotide variants (SNVs) and indels) and copy number variants (CNVs) are performed using the DRAGEN small variant caller and DRAGEN CNV respectively. Short tandem repeat (STR) expansions are being detected using ExpansionHunter (v5) as part of the DRAGEN software.
DRAGEN software is used for alignment and variant calling. Small variants and CNVs are being tiered and reported for chromosomes 1 – 22 and chrX. Small variants are also tiered and reported for the mitochondrial genome. STR expansions are being detected and tiered at selected loci. Structural variants (SVs) are not tiered but are being detected in the pipeline using a specialised SV caller integrated within DRAGEN software (DRAGEN SV, derived from Manta). Tiering is described further in sections 9.3, 9.5 and 9.6.2.
DRAGEN software incorporates the inferred sex into variant calling such that the overall ploidy of the X chromosome is considered (with possible values of 1 or 2 copies), and haploid calls are produced where appropriate. Variant calling is performed assuming a haploid model for chromosome X for individuals inferred to have a single copy of chromosome X (for example, XY, XO, XYY karyotypes) and assuming a diploid model for individuals inferred to have two or more copies of chromosome X (for example, XX, XXX, XXY karyotypes). A summary of the alignment and variant calling process is shown below.