B-Allele Frequency Plots¶
Access to B-Allele Frequency Plots¶
From the Orion release onwards, B-Allele Frequency plots generated by the Genomics England Rare Disease bioinformatics pipeline are available for download for all samples through the Interpretation Portal. Further instructions on how to access these plots are available here.
Background¶
B-Allele frequency plots provided by Genomics England display a combination of metrics that are useful for the detection of features and events that may not be immediately accessible through other approaches for variant detection.
B-Allele frequencies are quantified as "the proportion of sequencing read support for an alternate allele ('B-Allele') in comparison to the reference genome".
Hypothetically, we expect certain B-Allele frequencies in specific scenarios on diploid chromosomes:
Scenario | Genotype | Expected B-Allele frequency |
---|---|---|
Reference Homozygous | 0/0 | 0.0 |
Alternate Heterozygous | 0/1 | 0.5 |
Alternate Homozygous | 1/1 | 1.0 |
Normal ranges do differ from these hypothetical values, but when considered in combination with other datasets (coverage and copy number count), B-Allele frequencies can be useful to identify and characterise several event types, including:
- uniparental disomy
- regions of homozygosity
- large mosaic events (e.g. copy number variants, uniparental disomy)
Limitations of B-Allele frequency plots¶
B-Allele frequencies are calculated from genomic sequencing datasets, and are currently limited to genomic sites that pass quality filters and are covered by at least 10 sequencing reads. At an average genome-wide sequencing coverage of 30-40x, B-Allele frequency plots can be useful for the detection of large CNVs, particularly large mosaic CNVs, and other genomic events that are not detected or prioritised through other approaches in the Genomics England rare disease bioinformatics pipeline.
However, there are known limitations, for example, the level of mosaicism that can be detected, the range of allele fraction distributions that can be observed and the static nature of the plots provided.
It is strongly encouraged that B-Allele frequency plots are utilised alongside other datasets provided by the Genomics England rare disease bioinformatics pipeline to characterise and detect complex genomic events, including alignment files and structural variant vcfs. This may include, but is not limited to, assessment of the orientation of read pairs, the soft clipping of sequencing reads, and the relative read coverage within informative regions of the genome (e.g. pseudoautosomal regions on chrX to infer minor sex karyotypes). Moreover, in cases where the quantitative level of mosaicism is explicitly considered, inferences (whilst limited) can be made from the level of read support for variants within the small variant vcfs.
Information available in B-Allele frequency plots¶
As shown below, B-Allele frequency plots produced by the Rare Disease bioinformatics pipeline contain three different data types:
- top panel: copy number states
- middle panel: sequencing read coverage values
- bottom panel: B-Allele frequencies for small variants
Orange lines indicate general trends in different sections of the plots.
B-Allele frequency plots are provided for all autosomal and sex chromosomes on a single plot, and also individually for each chromosome. The resolution of chromosomal regions is higher in individual chromosome plots.
Examples of typical B-Allele frequency plots¶
Typical whole genome plot¶
Typical chromosome X plot, XY karyotype¶
Typical autosomal chromosome plot¶
There are several features to note in the typical plots included above:
Feature | Observations |
---|---|
karyotypic sex | The whole genome plot suggests this sample is from an individual with an XY karyotype. Specifically, the coverage values and copy number states indicate presence of 1 copy each for chromosome X and Y. Of note, the pseudoautosomal (PAR) regions of chrX, which are also present on chrY, have features consistent with a diploid state (CN=2). PAR1 (~first 3Mb of chrX) is viewable on the chromosome X single chromosome plot above. PAR2 is not easily viewable. |
normal copy number state | There are no clear indications of large copy number gains or losses across any chromosomes, and this can be confirmed for individual chromosomes (if appropriate to do so) on individual chromosome plots, shown above for chr8. This can be interpreted by the presence of: (1) normal copy number state, (2) no obvious deviation in coverage values, and (3) no abnormalities in B-Allele frequencies - please see additional examples on this page for key features of B-Allele frequency distributions indicative of abnormal copy number state. |
distribution of B-Allele frequencies | The plots below show the typical distributions of B-Allele frequencies in a sample from an individual with an XY karyotype. There is natural deviation from the hypothetical values suggested in the table above. This reflects both real biological variation and the protocols put in place to increase interpretability of the plots. There are two trends observable in the whole genome and autosomal chromosome plots that are consistent with normal diploid states: (1) examples of B-Allele frequencies close to 0 or 1, indicative of homozygous sites, and (2) examples of B-Allele frequencies around 0.5 (approx range of 0.3-0.7), indicative of heterozygous sites. |
general trends (solid lines) for coverage and B-Allele frequencies | Solid lines are provided to indicate general trends observed for sequencing coverage and B-Allele frequencies. The regions that are plotted as solid lines mirror the regions that are reported in the CNV vcf, and therefore may not perfectly align with the boundaries of other detectable features (e.g. homozygous regions or regions implicated in uniparental disomies). For coverage, the solid line marks the median coverage and the line colour reflects the CNV status. For B-Allele frequencies, the orange line represents the modal (i.e. most common) value for non-homozygous B-allele frequencies, and is included for all regions with >250 non-homozygous B-Allele frequency observations. |
Note
Copy number variants detected with high confidence in a proband will be considered through variant tiering approaches.
Note
Regions proximal to the centromere and telomeres may appear "messier" (i.e. greater range of B-allele frequencies) or absent of data. This is expected, and is due to the difficulty of alignment and variant detection in these regions with short-read sequencing technologies.
Note
Due to the CNV regions utilised to calculate trends, some regions will be missing from trend calculations as they are not considered for CNV detection, and the solid lines are unlikely to be representative of trends associated with uniparental disomies.
Mosaic CNV viewable in B-Allele frequency plots¶
This example illustrates trends indicative of a mosaic copy number gain, in this case impacting the whole of chromosome 8.
In both the whole genome and single chromosome plots, it can be observed that chromosome 8 has B-Allele Frequencies that deviate from the values typical for heterozygous variants. This occurs across the complete length of chromosome 8.
On the whole genome plot, it can also be observed that there is a slight increase in the coverage values across the length of the chromosome, although not significant enough to be detected as change in predicted copy number.
Regions of homozygosity¶
This example illustrates trends associated with regions of homozygosity (ROH) present on several chromosomes.
The characteristic features of ROHs can be observed, with B-Allele frequencies almost exclusively at 0 or 1, but clear indication in the coverage and copy number panels of the plots that there are two copies of the chromosome.
This scenario may occur due to consanguinity, uniparental isodisomy or a balanced CNV event where the region lost on one copy is replaced with that region from its pair.
These trends can also be seen in the individual chromosome plots, and are shown here for chromosome 1, with ROH impacting approximate regions 50-60 Mb and 90-120 Mb.