Illumina’s DRAGEN (Dynamic Read Analysis for GENomics) secondary analysis software is a toolbox that addresses important challenges in analyzing next-generation sequencing (NGS) data. We use different DRAGEN features to analyze many of our DNA products sequenced on Illumina sequencing platforms. With the latest DRAGEN update, novel features and options are available – to us and to you!
Variant calling featuring mosaic detection
The typical allele frequencies of small variants in germline samples are 0%, 50% or 100%: the variant occurs on neither of the alleles (0%), on one of the alleles (50%), or on both alleles (100%) in all cells. Unfortunately, biology is not always typical. DNA changes that occur after the zygote stage throughout life lead to different DNA sequences in an individual’s cells. Thus, some cells have a different genetic makeup than others. This phenomenon is called mosaicism. Post-zygotic mosaic variants typically have allele frequencies ranging from less than 1% to approximately 50%, depending on how large the proportion of cells with post-zygotic variants is. Especially low mosaic allele frequencies can be hard to detect as they might be mistaken for background noise.
With the new DRAGEN version, a new mosaic machine learning model improves the sensitivity of low allele frequency calls. With this model, mosaic detection is possible. Additionally, a mosaic detection mode is active for copy number variations.
Thus, the new DRAGEN version improves the sensitivity to detect mosaic variants, allowing for more comprehensive variant analyses and enhancing the research of mosaicism disorders.
Figure 1 | Allele frequencies of ”typical” germline small variants and mosaic variants.
Improved CNV calling for our somatic products based on allele-specific copy number (ASCN) analysis
Calling copy number variations is especially challenging for tumor samples. Such samples are a mixture of tumor and normal cells, disturbing the variant signal. For proper variant calling, the tumor purity and ploidy need to be estimated to correct the read counts accordingly. DRAGEN’s allele-specific copy number (ASCN) module identifies the most likely tumor purity prior to variant calling by using observed reads and b-allele frequencies. The b-allele frequency characterizes the ratio between the number of B-alleles (the non-reference alleles) compared to the total number of alleles at a distinct single nucleotide polymorphism (SNP) position. Within the DRAGEN’s ASCN module, the B-allele ratios in the tumor are calculated to allow for allele-specific copy number calling on the tumor samples. If possible, a matched normal sample for a tumor-normal comparison is preferred for the calculation. However, if no normal sample is available, a general catalog of population SNPs is available in tumor-only mode for calculations. With this ASCN module, calling of copy-number variations in tumor samples is improved.
Innovative pangenome usage
Correctly identifying all types of variants is key to understanding human health, apprehending disease mechanisms, and discovering novel disease targets or genetic markers with medical significance. To identify an individual’s variants correctly, a thorough mapping of the sequenced reads to an appropriate reference is required. With the new DRAGEN version, the usage of a pangenome reference for human samples is now possible. This pangenome reference consists of a human reference and more than 100 assemblies across 26 ancestries to account for errors in the human genome. With the additional assemblies, variants from multiple genomes are included to better represent the sequence diversity throughout the human population. The use of the pangenome-based reference shall improve the mapping accuracy and downstream variant calling. According to Illumina, the usage of the pangenome reference for human samples drastically decreases the number of false negative and false positive variant calls by more than 60% compared to a linear reference. Consequently, we follow Illumina’s recommendation and use the pangenome reference for human samples.
Figure 2 | Pangenome as a reference. Several assemblies are used together with the reference genome for the mapping step of the reads. By using a pangenome as a reference instead of a linear reference, variants from multiple genomes are included to better represent the sequence diversity throughout the human population. Variants that are common amongst all assemblies can be found in the core genes.
Extended annotations file with additional information
After variant calling, the resulting VCF files are annotated using Illumina’s Nirvana software that provides clinical-grade annotations of genomic variants. The software generates a structured JSON file as output that represents all annotation and sample information. Although JSON is a human-readable, text-based format for storing and exchanging data between systems, the annotated JSON file is difficult to read and comprehend. For this reason, we provide an additional annotations file with selected information in a tabular format (TSV file). Amongst others, this tabular file already contains information about the chromosomal position and observed variant, the functional consequence of the variant in the context of a transcript, the position and sequence change in the context of the affected transcript, and information about the observed variant in the global population. With the new update, we added further information from the JSON file into the tabular file, such as information from external data sources like ClinVar or COSMIC:
- ClinVar is a public archive reporting the relationships between human variations and phenotypes, facilitating the understanding of the variant’s relationship with the observed health status of an individual.
- COSMIC stands for Catalogue of Somatic Mutations in Cancer. It is the world’s largest source of manually curated somatic mutation information concerning human cancers.
Further information on other databases or external data sources can be found in the JSON file. For the sake of clarity, we have only included information from the ClinVar and COSMIC sources. Together with the annotation files, we provide a documentation of annotation, in which all columns are explained.
With these new features and options included in our standard analysis of DNA samples sequenced on our Illumina sequencing systems, we use the potential of Illumina’s DRAGEN update. We have tested the new features and options intensively to ensure the continuance of high-quality results. We are happy to offer these new features and options for your next sequencing project.


