Read, Read Pairs, and Clusters – Who’s Who in Next Generation Sequencing?

In the context of next-generation sequencing (NGS), the terms read, read pairs, and clusters often occur – and sometimes it is not that easy to distinguish these terms. But what do they mean? To understand the difference between these terms, we need to understand how next-generation sequencing works.

As an example, we will follow the workflow required for paired-end RNA sequencing on Illumina platforms step by step as illustrated in figure 1:

  • We start with the RNA sequences. For coding transcriptome sequencing, the poly(A) RNA is enriched; for whole transcriptome sequencing, rRNA is depleted.
  • Next, the RNA sequences are converted to cDNA strands, as these sequences are more stable and easier to amplify compared to RNA sequences. During the library preparation, adapters are added to the cDNA strand that facilitate the binding of the sequences to the sequencing platform.
  • When using Illumina’s patterned flow cell technology, the flow cell holds tens of billions of so-called nanowells at fixed locations across its surface. Approximately 1,000 DNA probes are located in these nanowells to capture the prepared cDNA strands. These DNA probes can either bind the 3’ end (blue in the figure) or the 5’ end (green in the figure). In the next step, each cDNA strand is hybridized with the help of its adapters to one DNA probe in a nanowells on the flow cell. In this step, it is not relevant which adapter (3’ or 5’) binds to the adapter. It is, however, important that only one cDNA strand binds into a nanowell.
  • In the next step, the cDNA strand is bent, and the unbound adapter binds to its respective adapter on the flow cell, forming a bridge and giving the bridge amplification its name. This amplification step is performed several times immediately and rapidly. With the help of bridge amplification, a single cDNA molecule is amplified to about 1,000 copies within one nanowell. These identical copies of the initial cDNA strand in a particular nanowell are called a cluster. The strands in the cluster are subsequently sequenced together to increase the signal during sequencing and allow for a more accurate detection of the sequence.
  • During the bridge amplifications, strands are bound with their 3’ or 5’ end to the flow cell surface. For the first part of the sequencing process, the reverse strands are removed. Consequently, only the forward strands remain. These are subsequently sequenced with a sequencing-by-synthesis approach, resulting in the generation of the forward read, also called read 1. Thus, a read is the sequenced output from one cluster, represented as a string of nucleotides. It is the actual data output of the sequencing run.
  • When sequencing samples in the paired-end sequencing mode, the strand is sequenced from both ends. Until this point, the sample is only sequenced from one end. To allow for generating the reverse read, another bridge amplification step is performed. This time, however, the forward strands are subsequently removed, and the remaining reverse strands are sequenced using the sequencing-by-synthesis approach to generate the reverse read, which is also called read 2.
  • After these two sequencing rounds, read 1 as the forward read and read 2 as the reverse read were generated from one cluster. These two reads form a read pair as they come from the same cluster and represent the same initial cDNA strand.

This workflow is also applicable for DNA sequencing on Illumina platforms: the DNA sequence is fragmented, but no cDNA synthesis is required. Consequently, clusters, reads, and read pairs are also generated for paired-end DNA sequencing.

Graphic workflow of the paired-end sequencing of RNA on Illumina platforms.

Figure 1 | Workflow of the paired-end sequencing of RNA on Illumina platforms.

August 7, 2025 | Sequencing |