Skip to content

Quality control

Genomic sequencing data are subject to a series of QC checks performed by the Genomics England automated pipeline to ensure they are of sufficient quality and are suitable for processing.

The following QC checks are completed as part of the pipeline:

Quality control Description
Data integrity md5sum check to confirm integrity of the genomic data transferred from the sequencing provider
Genome coverage Alignments must cover at least 95% of the reference genome at 15x or above.

Coverage will be calculated using bases from reads with mapping quality >10 and only counting sequences that remain as output by Illumina’s current aligner after:
- removal of overlapping bases
- removal of duplicated reads
- trimming adaptors
- quality trimming ends of reads
- clipping semi-aligned reads

(Note: Saliva samples are exempt from this check)
Base quality Sequence data for each sample will contain more than 85x10^9 bases with quality >=30.

This threshold will be met by reads that are not duplicated and will not double count overlapping bases in the same fragment, after adaptor trimming and quality trimming.
Contamination Germline cross-sample contamination performed using VerifyBamID. Samples with >3% contamination are considered as failing.

Note

  • Adaptor trimming: when adaptor sequences are found at the end of the reads they are clipped. These sequences can be specified on the command line or in the sample sheet.
  • Quality trimming: when the average base quality at the 3' end of the read is below a given threshold (15) the end of the read is trimmed.
  • Semi-aligned read clipping: when a large number of mismatches accumulate at the end of a read, possibly indicating an indel or a structural variant, the mismatching end of the read is soft clipped.

Samples not passing these criteria are reported to the NHS GLHs via the Sample Failures Report. Saliva-derived DNA samples are exempt from the minimum coverage requirement and a flag LOW_COVERAGE will be displayed in the Interpretation Portal for any sample which does not pass the coverage QC metric (indicated in red box in image below).

Low coverage