How works Whole-genome Sequencing with high-throughput methods : case of Illumina (Julien)
Next-generation sequencing (NGS) diverges from the old-generation methods in their rapidity and quantity of bases sequenced. Usually, NGS fragment randomly the genomes into small pieces while old method as Sanger chose to sequence the genome in once. In NGS, we can have massively parallel sequencing of all these little pieces.
In NGS, different methods can be used as single molecule real-time sequencing (SMRT), pyrosequencing or nanopore sequencing. In the frame of my internship, we sequence bacterial genome from 2 to 6 mpb with a MiSeq made by Illumina. The method used by this technology is called « Sequencing by synthesis ».
MiSeq workflow (and the others Illumina models) has four basic steps : Library preparation, Cluster generation, Sequencing and Data analysis.
a) Library preparation
After DNA extraction, samples are fragmented for massive parallel sequencing. This step allow the fragmentation of DNA strains in little pieces and the add of adapters on the 5’ and 3’ position to allow ligation. Adapter-ligated fragments are then PCR amplified and gel purified. Those adapters will be essential to bound the DNA fragments to the sequencer.
b) Cluster generation
The DNA fragments are loaded in a part of the sequencer called « flow cell ». This part comprise a lawn surface-bound oligos complementary to the library adapters. It’s important to know that one of the adapter has the complementary sequence, and the other has the same sequence than surface-bound oligos.
A: Flow-cell Stage
B: Flow-cell compartment door
C: Flow-cell clamp
D: Flow cell
E: Flow cell clamp release button
Each fragments bounded to lawn is then amplified by polymerase. The forward strains are complementary to the template DNA. Templates are washed away, resulting to liberate the second adapter oligo-nucleotide which fold over and can now bound to its complementary surface-base oligonucleotide like a bridge. This one strand bridge is then amplify to make a double strand bridge. After denaturation, strands are again separated and linearized. This process is repeated multiple time to form a cluster of one DNA fragment. Millions of clusters are made simultaneously for each DNA fragment. After bridge amplifications, reverse strands are cleaved and washed off, leaving only the forward strands.
c) Sequencing by synthesis
Sequencing begin with the extension of the first sequencing primer to produce the first read. Nucleotides are add one by one and a light is emit to exited the nucleotides by fluorescence. This fluorescence allow to detect if a A, T, G or C base have been combined.
Read 1 fall over and bound again like a brige to a complementary adapter. Read 2 are sequenced as the same way. Sequencing the two strains of DNA even if there are complementary is usefull to comparate the two read and si if there is no error in sequencing. We name that « paired-end strains ». Moreover, paired-end reads have a higher quality than single-end reads for WGS and permit to use different method for data analysis.1
For information, bacteria WGS is often do at 2×250 cycles, giving relatively short reads.
d) Data analysis
Files with data of each paired-end reads are processed by a bioinfomatics pipeline (it will the subject of an other blogpost).
1. Nakazato T, Ohta T, Bono H. Experimental design-based functional mining and characterization of high-throughput sequencing data in the sequence read archive. PLoS One. 2013;8(10):e77910