de novo variant detection¶
The detection of de novo small variants using the DRAGEN algorithm is performed directly from gVCF files rather than alignment files (as in some other algorithms such as Platypus). In generating gVCF files, homozygous reference variants with similar quality scores from consecutive genomic positions are collapsed into a group and represented by a single entry in the resulting file. Consequently, metrics (e.g., quality scores, depth of coverage, allelic fractions etc) relating to specific sites with homozygous reference genotypes may not be available. Thus, care should be taken when interpreting the apparent allelic depth in parental samples at sites corresponding to de novo variants in their offspring (i.e., homozygous reference positions in the parents) as the metrics presented in VCF files may not correspond to the anticipated position.
This behavior applies to all chromosomes, including sex chromosomes and the mitochondrial genome. Allele counts for homozygous reference positions can be obtained directly from the CRAM file, for example by viewing the CRAM file in IGV or generating a pile-up using bcftools for a specified genomic position.
The DRAGEN de novo small variant detection algorithm determines all positions for which the genotypes in a trio are not consistent with a Mendelian inheritance pattern. Detection of de novo variants is not restricted to variant positions with homozygous reference genotypes for the parents and a heterozygous genotype for the offspring.
There are three possible groups that are variant can be assigned from small variant de novo detection:
DN value | Description |
---|---|
Inherited | Genotype is consistent with Mendelian inheritance in the trio |
LowDQ | Genotype is inconsistent with Mendelian inheritance in the trio DQ score is less than quality threshold |
DeNovo | Genotype is inconsistent with Mendelian inheritance in the trio DQ score is greater than or equal to quality threshold |
Note
de novo quality score (DQ) thresholds are posterior probability scores calculated from the consideration of possible genotypes within the trio - with the probability of error assumed to be independent for each member of the trio. The rare disease bioinformatics pipeline applies default DQ values for SNVs (≥0.0013) and indels (≥0.02).
Note
Only de novo variants with quality (DQ) scores that match or exceed the thresholds indicated in the box above will be considered during the Genomics England variant tiering approach and displayed to users in the Interpretation Portal for further consideration. The DQ scores are not included in the displayed information to the user, but are present in the VCF file available for download.