© Springer International Publishing AG 2017
Christoph BleidornPhylogenomics10.1007/978-3-319-54064-1_3

3. Sequencing Techniques

Christoph Bleidorn
(1)
Museo Nacional de Ciencias Naturales, Spanish National Research Council (CSIC), Madrid, Spain
 
  • Sanger sequencing is based on the chain termination method which relies on separating DNA by size and the incorporation of labelled modified nucleotides.
  • 454 pyrosequencing measures the amount of light produced by the incorporation of nucleotides in a cascade of enzymatic reactions under the presence of a luciferase.
  • Reversible terminator sequencing is a sequencing-by-synthesis approach where the incorporation of modified nucleotides is detected stepwise.
  • Ion semiconductor sequencing analyses changes of hydrogen ion concentration during the incorporation of nucleotides into the DNA strand.
  • Single-molecule real-time (SMRT) sequencing monitors without interruption the incorporation of differently fluorescent-tagged nucleotides by the polymerase activity.
  • Nanopore sequencing detects the identity of nucleotides within the DNA strand while it is passing through a nanopore.
  • The availability of next-generation sequencing (NGS) platforms transformed the field of genomics and led to a dramatic decrease in sequencing costs.

3.1 Sanger Sequencing

Two DNA sequencing techniques were developed in the mid-1970s. Allan Maxam and Walter Gilbert proposed a chemical cleavage method, which was initially widely used around molecular laboratories (Maxam and Gilbert 1977). Around the same time, Frederick Sanger and colleagues developed a chain termination method (Sanger et al. 1977). After the chemistry needed for this method became commercially available, Sanger sequencing got the standard sequencing technique for all applications. For nearly three decades, Sanger sequencing was synonym to DNA sequencing. DNA sequencing revolutionized many fields of biological and medical sciences, and fittingly Frederick Sanger and Walter Gilbert were awarded with a Nobel Prize in Chemistry in 1980, which they shared with Paul Berg (Brenner 2014).
Sanger’s chain termination method is basically based on two principles: (I) DNA can be separated by size and (II) DNA polymerases is able to incorporate modified nucleotides. As DNA is negatively charged, gel electrophoresis allows the separation of DNA strands by size as larger DNA strands migrate slower to a positive electrode. By using polyacrylamide gels, it is even possible to detect minute differences of single base pairs between two different DNA strands. When separating DNA into single strands, it is possible to replicate one strand by the use of DNA polymerase II which will add one nucleotide after another complementary to the template DNA strand. Requirements are a DNA primer (oligonucleotide) that fits to the template and the availability of nucleotides (dNTPs) for incorporation. Nucleotides bear a 3′ OH group and a 5′ phosphate group which will be connected by the polymerase while removing two of the phosphates of the 5′ end. However, if the 3′ OH group is missing, it is impossible to connect another nucleotide, and the elongation of the template strand is terminated. Sanger used exactly such modified nucleotides, which have an H at the 3′ end instead of the OH group (dideoxynucleotides, ddNTPs) for his sequencing reaction. The sequencing reaction mix is composed of DNA polymerase, a primer for the target region, and a mix of dNTPs and ddNTPs. The ratio of dNTPs/ddNTPs is usually around 100 to 1, so that termination could be obtained at least once for every position of the DNA template during the amplification process. Whenever a ddNTP is incorporated, the elongation of the template DNA stops, thereby generating DNA molecules of different sizes. Using electrophoresis, these differently sized DNA molecules can be separated, and the identity of the incorporated ddNTP will allow reconstructing the nucleotide sequence (◘ Fig. 3.1). Initially, ddNTPs were labelled radioactively, and reactions were performed for each of the four bases separately. Huge gels had to be inspected by the eye to reconstruct the sequence order, a reason why DNA output from sequencers is still known as reads. Later on, unique fluorescent labels for all four different bases (adenosine, thymine, guanine and cytosine) were introduced which could be detected by a laser while migrating through the gel. This innovation strongly decreased analysis time, as all reactions could run at the same time and increased the accuracy of base detection (Prober et al. 1987). Computer-based analyses allow that the detected fluorescence will be automatically associated with the corresponding nucleotide to generate a chromatogram for every sequence read (◘ Fig. 3.1).
A332029_1_En_3_Fig1_HTML.gif
Fig. 3.1
Chromatogram of a Sanger sequencing run. All four bases are labelled differently and are separated according to their size
Current machines for high-throughput analyses like the ABI 3730xl are equipped with 96 capillaries. The time for a single sequencing run will take, according to the envisaged quality, 2–3 h, with up to 1000 bp sequence reads of high quality. The output for a single run will be around 100 Kb and during 24 h sequencing of 1 Mb might be realistic (Liu et al. 2012). After trimming the ends of sequencing reads, an accuracy of 99.999% can be achieved, and sequence errors are mainly due to errors in preceding amplification steps (Kircher and Kelso 2010). As such, Sanger sequencing remains a good option for high-quality sequence reads. The first human genome sequence has been sequenced with this technique.

3.2 454 Pyrosequencing

The first next-generation sequencing (NGS) technique that was commercially available has been 454 sequencing. This technique presented solutions for each of the bottlenecks typically found in large-scale classical Sanger sequencing: library preparation, template preparation and sequencing itself (Rothberg and Leamon 2008). The principle for 454 sequencing goes back to a real-time sequencing approach called pyrosequencing developed by Ronaghi et al. (1996), and a highly parallelized high-throughput automatization was presented by Margulies et al. (2005). The great advantage of this method compared to Sanger sequencing is that sequences are read out while being synthesized, therefore omitting electrophoresis steps for size separation of DNA fragments. In this approach, nucleotides are released one after another and washed over the template DNA strand. A cascade of enzymatic reactions leads to the emission of detectable light signal which is in its strength proportional to the number of nucleotides incorporated during this step. For example, if C’s are washed over the template DNA, the polymerase incorporates as much consecutive C’s as are present in the target sequence. When being incorporated by the polymerase, a pyrophosphate is released for each nucleotide, which is subsequently converted to ATP by ATP sulphurylase (◘ Fig. 3.2). Present luciferases can use this energy to oxidize luciferin, which leads to the generation of light (Ronaghi 2001). After detection of the signal, superfluous nucleotides are removed in a washing step, and the next nucleotide is provided in a subsequent flow cycle.
A332029_1_En_3_Fig2_HTML.gif
Fig. 3.2
Principles of 454 pyrosequencing in picotiter well plates. The dNTP cytosine is flushed over the wells containing beads carrying DNA fragments and reaction mix. Incorporation of a dNTP by the polymerase leads to the release of a pyrophosphate, which is converted to ATP by ATP sulphurylase. Present luciferases use this energy to oxidize luciferin, which leads to the generation of light (Reprinted by permission from Macmillan Publishers Ltd: Nature Review Genetics (Metzker 2010), copyright 2009)
For the preparation of sequencing libraries, DNA is linked with special adaptors by sequential ligation and subsequently separated into single strands. One of these adaptors allows the binding to beads by hybridization. During a process called emulsion PCR, library fragments are loaded onto beads and amplified. To do this, a water mixture containing the capture beads, sequencing libraries and PCR reagents are mixed with synthetic oil in a plastic vessel. Vigorous shaking leads to the formation of droplets around the beads which by chance will usually contain a single DNA library fragment. As each droplet contains also all PCR reagents, an amplification of the library fragment will produce millions of copies which bind to the capture bead. After finishing the PCR, beads will be cleaned from the oil, and those which do not carry DNA are removed. A key to high-throughput sequencing with this method is the use of picotiter well plates for sequencing (◘ Fig. 3.2). These plates bear approximately 1.6 million wells with a volume of 75 picoliters (Margulies et al. 2005). These wells are designed to exactly fit a single bead which carries the DNA fragments to be sequenced. Each well is filled with a capture bead and smaller beads carrying enzymes like ATP sulphurylase and luciferases. As described above, free nucleotides are washed over the well plate, and light emission is detected by a high-resolution camera. The number of filled wells determines the number of sequences to be generated in a single run.
Sequencers of Roche’s 454 GS FLX series can produce around 1,000,000 sequences with an average read length of 700 bp and reads up to 1000 bp. A single run will take around 24 h, with an output of 700 Mb altogether. Four hundred fifty-four sequences are of high quality with an accuracy of ~99.75% after removing reads with N’s (Huse et al. 2007). Sequencing errors are mainly indels (insertion and deletions), often found in stretches of homopolymers (Gilles et al. 2011). Due to the considerably low output in comparison to other methods, 454 sequencing lost its relevance for genome projects. However, especially for high-throughput metabarcoding studies based on amplicon sequencing, this technique is still frequently used (Yu et al. 2012; Petrosino et al. 2009). However, Roche decided to stop supporting the 454 platform in 2016.

3.3 Reversible Terminator Sequencing (Illumina)

The by far most widely used NGS platform is Illumina sequencing. Also known under its former company name as Solexa sequencing, this technique is based on cyclic reversible terminator technology. The sequencing reaction takes place on a flow cell, where literally billions of sequences can be processed during a single run. Based on a sequencing-by-synthesis approach, all four nucleotides are added simultaneously to the flow cell, together with a polymerase. Similar to Sanger sequencing, the PCR reaction is stopped after incorporating a modified base. Every incorporated nucleotide is chemically blocked at its 3′ OH group and carries a removable fluorophore which can be identified by laser. Most Illumina sequencers (GA II, HiSeq, MiSeq) use a system with four colours, one for each base. Based on four different images coming from four different colour channels, bases are called for each sequence. The Illumina NextSeq system uses only two channels for base detection. Here, red refers to a C and green to T, mixed signals (red and green) are interpreted as A and the missing of a dye refers to a G. As only two images are needed for base calling, the detection becomes faster. After detection, the blocking and the fluorophore are removed, and the process of sequencing is continued by incorporating the next nucleotide (Bentley et al. 2008). In contrast to Sanger sequencing, all incorporated nucleotides terminate the elongation process and carry fluorophores.
For library preparation, two different adaptors (P5 and P7) are ligated to the ends of all DNA molecules (◘ Fig. 3.3a). In an additional step, indices can be added by an indexing PCR creating unique libraries, which allows pooling during sequencing. In contrast to other barcoding approaches, the indexes are placed within the adaptors and not at the ends of the molecules to be sequenced (Meyer and Kircher 2010). The resulting library is amplified with longer primers which further extend the adaptor sequences. In the next steps, the double-stranded library will be separated into single strands which are pumped into the flow cell. Two different types of short oligonucleotides which are complementary to the ends of the library adaptors are distributed as anchors across the flow cell. These anchors can hybridize to the end of adaptors and by adding nucleotides and a polymerase. Single-stranded molecules are copied starting from the hybridized anchor region. The newly synthesized double-stranded molecule is denatured, and the original template DNA strand is washed away. As a result, all newly synthesized strands are covalently attached to the flow cell. In a step called bridge amplification, the single-stranded strands start to bend over to hybridize at their end with adjacent free anchor oligonucleotides (◘ Fig. 3.3b). Again, the hybridized primer is extended by polymerase, which leads to the formation of a double-stranded bridge. This bridge is denatured, resulting in two copies of covalently bound single-stranded DNA templates. The bridge amplification step is repeated several times until cluster of some thousand copies are generated. In the end, the bridges are again denatured, and all reversed strands are cleaved and washed away. After blocking of the free 3′ ends, the sequencing as described above begins. While sequencing, the number of cycles will determine the number of nucleotides to be sequenced. With current machines and chemistry, usually 96–250 cycles are used to produce sequences in according length. It is also possible to use paired-end (PE) sequencing, which means that both ends of a single molecule from the sequencing library will be determined. In this case, the blocking of the free sequence ends has to be removed, and a new bridge is formed due to bending of the sequence ends to corresponding adjacent anchors (◘ Fig. 3.3c). The single strand is replicated using a polymerase, and afterwards double strands are separated again. This time, all original forward strands are cleaved and washed away, and after blocking free ends, sequencing begins again.
A332029_1_En_3_Fig3_HTML.gif
Fig. 3.3
Illumina sequencing library preparation and bridge amplification. a For library preparation, two different adaptors are ligated to the end of sheared DNA fragments. b The sequencing library is pumped into the flow cell and can bind to short oligonucleotides on its surface, which are complementary to adaptor sequences. Single-stranded molecules are copied starting from the hybridized anchor region. Newly synthesized double-stranded molecules are denatured, and the original template DNA strand is washed away. Single-stranded strands start to bend over to hybridize at their end with adjacent free anchor oligonucleotides, thereby building a bridge. Multiple PCR cycles are used to generate clusters of clonal sequences. c To generate single-stranded templates for sequencing by synthesis, DNA fragments are linearized by cleavage within one adaptor sequence and subsequently denatured. For paired-end sequencing, the DNA template forms a bridge, and the second strand of the DNA fragment is synthesized. This time, cleavage will take place in the opposite adaptor region to provide a template for sequencing (Reprinted by permission from Macmillan Publishers Ltd: Nature (Bentley et al. 2008), copyright 2008)
Illumina sequencing can currently generate the biggest output of sequence data, and several different platforms are available. The Illumina HiSeq models are able to generate around 600 million per lane of a flow cell, which comprises eight lanes altogether, resulting into nearly 5 billion sequences for a full run. According to the number of cycles used, this would produce 750 Gb sequence output for 150 cycles. Including copying the data, such a run would last around 10 days. Faster «rapid» runs using flow cells with only two lanes, which also produce «only» around 66% of the output per lane, are available, and these can be finished in 24 h. In January 2017, Illumina announced the release of the NovaSeq models, which will replace the HiSeq series in the near future. Using NovaSeq, an output of up to 3 Tb in a 40 h run is envisaged. Two Illumina machines are available as desktop machines with the NextSeq 500 and the MiSeq. The NextSeq 500 platform runs a single lane with an output to up to 400 million sequences. The Illumina MiSeq is the cheapest model, and according to the model, the output of a single lane flow cell is between 5 and 25 million sequences. The accuracy of Illumina sequencing is with ~99.25% slightly worse than for Sanger or 454 (Quail et al. 2012). However, the error profile differs from 454 as substitutions are found much more frequently instead of indels, which is usually easier to handle in downstream analyses like mapping or assembly.
Illumina sequencing is currently standard for transcriptome sequencing and the resequencing of genomes. The read length of earlier chemistry and machines was limited to between 36 bp and 76 bp. However, with the availability of longer reads, this technique is now also frequently used in metabarcoding and metagenomic studies as well.

3.4 Ion Semiconductor Sequencing (Ion Torrent)

The Personal Genome Machine (PGM) released by Ion Torrent in 2010 was a totally new approach based on ion semiconductor sequencing (Rothberg et al. 2011). After Ion Torrent was purchased by Life Technologies, a platform with even higher throughput was released: the Ion Proton. In difference to Sanger, Illumina and 454, this technique does not rely on analysing optical signals. Instead, changes of hydrogen ion concentration are analysed. Anytime when a nucleotide is incorporated into a DNA strand by polymerase activity, a hydrogen (or proton) is released (◘ Fig. 3.4). The release of this proton can be measured in real time by ion-sensitive field-effect transistors (ISFETs) (Sakurai and Husimi 1992). By using available methodology and software from modern imaging devices (laptops, digital cameras), an array has been built for the large-scale use of ISFETs. This technique is called complementary metal-oxide semiconductor (CMOS) process (Rothberg et al. 2011). Every sensor in this array directly monitors the hydrogen ion release during sequencing. Each chip of the sequencer contains between 1 and 660 million sensors, which are composed of a well with an acrylamide bead with a DNA template containing also the dNTPs. The chip size can be chosen according to the required number of reads for the sequencing project. As in 454 sequencing, the wells are flooded in cycles with one sort dNTPs at a time. Below the well lies a metal-oxide sensing layer, which itself is on top of a sensor plate and floating metal «gate» for the transmission of electronic information about the pH changes to the semiconductor (◘ Fig. 3.4). The detected changes in pH allow inferring if and how many bases have been incorporated to a sequence read. Relying on a purely electronic detection system without any optical components allows a considerably cheap instrument cost compared to other NGS platforms.
A332029_1_En_3_Fig4_HTML.gif
Fig. 3.4
Principle of ion semiconductor sequencing. A well containing a bead with a DNA fragment is shown. Incorporation of a nucleotide releases a proton (H+), which changes the pH in the well. This release changes the potential in an underlying metal-oxide sensing layer, which is received by a transistor (Reprinted by permission from Macmillan Publishers Ltd: Nature (Rothberg et al. 2011), copyright 2011)
The library preparation is similar to the 454 technique. Adaptors are ligated to DNA molecules, which are loaded onto magnetic beads. Molecules are amplified using emulsion PCRs. Wells are suited to fit one bead, which are loaded on to the chips by a centrifugation step.
Two different sequence platforms are available (PGM and Ion Proton), and these can be run with differently sized chips. Chips for the PGM bear less sensors (PGM 314: 1.2 million; PGM 316: 6.1 million; PGM 318: 11 million), with an expected output of 500,000 to 5.5 million sequences. With a read length of up to 400 bp, a single run using the largest chip would generate an output of ~2 Gb. The runtime is according to the chip size with 2 to 7 h relatively fast compared with other techniques. The chips of the Ion Proton system are larger sized (PI, 165 million sensors; PII, 660 million). Using the largest size ~330 million reads with up to 200 bp can be generated in a run which lasts between 2 and 4 h. This would equal an output of 66.6 Gb. With ~98%, the accuracy is lower than for 454 and Illumina platforms (Quail et al. 2012; Merriman et al. 2012). Similar to the 454 technique, indels are the prevailing error type. However, the biggest advantage for this technique is speed. The library preparation should last less than 6 h, and sequencing runs are finished in a few hours. Using this approach, it is possible to get bacterial genomes completed from extracted DNA to assembly in less than 3 days. This rapid methodology has been proven useful while monitoring and characterizing bacterial genomes during an E. coli outbreak in Germany in spring 2011 (Mellmann et al. 2011).

3.5 Single-Molecule Real-Time (SMRT) Sequencing (PacBio)

The so far discussed sequencing techniques produce rather sort sequence reads, mostly below 1000 bps. Machines based on a new sequencing method targeting single molecules are available since 2011 from the company Pacific Biosciences (PacBio), which are able to produce considerably longer reads. Here, polymerase activity is monitored without interruption while incorporating four differently fluorescent-tagged dNTPs (Eid et al. 2009). These phospho-linked nucleotides carry a fluorescent label on the phosphate group of the nucleotide and are cleaved away after incorporation. By using real-time imaging, incorporated nucleotides are detected while they are synthesized along a single DNA template molecule. The detection takes place in a zero-mode waveguide (ZMW) microwell which is a nanophotonic structure surrounded by aluminium. Each ZMW measures only 70 nm in diameter and 100 nm in depth, leaving an observation volume of 20 × 10−21 litres. A single molecule of Φ29 DNA polymerase is attached to the surface of the ZMW, and its activity can be measured (◘ Fig. 3.5). The small volume of the ZMWs reduces the amount of background noise due to the presence of fluorescent-labelled nucleotides. While detecting the level of fluorescence intensity in a single ZMW, a more or less stable background level is measured. An association of a phospho-linked nucleotide with the template DNA in the polymerase active site triggers a pulse of fluorescence intensity for the corresponding dye. This light emission lasts some milliseconds, which is recorded by the detector of the ZMW. The fluorescence label is cleaved by the DNA polymerase leaving a phosphodiester bond which allows the elongation of the DNA template. The cleaved dye diffuses, leading to a drop of the recorded emission intensity back to the background level. The next nucleotide can be incorporated and the measurement repeats (◘ Fig. 3.5). The synthesis rate is around two to four bases per second. In contrast to all other methods described so far, SMRT sequencing does not interrupt the process of DNA synthesis. Interestingly, it has been shown that the emission spectra contain more information besides the nucleotide identity. The duration of an emission and the interval between successive emissions also reveal information about nucleotide modifications. Using this data, a genome-wide mapping of methylation patterns becomes possible (Flusberg et al. 2010).
/epubstore/B/C-Bleidorn/Phylogenomics/OEBPS/A332029_1_En_3_Fig5_HTML.jpg
Fig. 3.5
Principle of single-molecule real-time (SMRT) sequencing. a A single molecule of Φ29 DNA polymerase is attached to the surface of a zero-mode waveguide (ZMW) microwell, and its activity while copying a DNA molecule is measured. b Emission spectra are detected, while fluorescent-labelled nucleotides are incorporated by the polymerase (Reprinted by permission of Pacific Biosciences)
High-quality DNA is needed in a high quantity for the library preparation, as no additional amplification step is included. Genomic DNA is sheared to the desired average DNA length, ends are repaired and hairpin adaptors are ligated to these ends. Hairpin adaptors represent single-stranded loops to which the sequencing primer can bind. The construct of the double-stranded template DNA flanked by two hairpin loops is called SMRTbell (Travers et al. 2010). Using polymerase with strand displacement activity, a primer binding to the hairpin adaptor can be extended displacing one DNA strand, while the other is used as a template. Sequencing has been even facilitated without any library preparation (Coupland et al. 2012). Whereas there seems to be no negative effect on the read length, the output was considerably lower as for standard library preparation. As no ligated adaptors are present, known primer regions or random hexamer primers can be used for sequencing.
SMRT sequencing has the advantage that the sequencing process is with 4 h rather fast. The read length is long, averaging around 15 Kb, and reads may exceed lengths of 50 Kb and more (Lee et al. 2014). Moreover, single molecules are sequenced, and modifications are detected as additional information. However, a caveat of the technique is the high error rate. An accuracy of ~80–85% is given for single pass reads (Hackl et al. 2014). Even though a large fraction of the sequencing errors seem to stem from deletions and insertions, no significant sequencing bias has been found (Ross et al. 2013). The output is with 2.8 Gb sequencing data (PacBio RSII) per day rather low compared with other NGS techniques. Nevertheless, the availability of long reads from this technique dramatically increases the quality of genome and transcriptome assemblies (Koren and Phillippy 2015; Tilgner et al. 2014). Currently, long reads via SMRT sequencing represent the gold standard for the de novo assembly of genomes, as a more complete picture of gene content, structural variation and repeat biology can be achieved (Gordon et al. 2016).

3.6 Nanopore Sequencing

The principle of DNA (and RNA) sequencing using nanopores was firstly proven back in the mid-1990s (Kasianowicz et al. 1996). For the first sequencing experiments, a staphylococcal nanopore α-hemolysin protein pore was incorporated into a phospholipid bilayer separated by two reservoirs with a salt solution. By applying an electric current using electrodes placed on the opposite sides of the bilayer, negatively charged DNA molecules are forced passing through a small nanopore channel with a diameter of a few nm. Nucleotides passing the pore characteristically decrease the amplitude of the ionic current and can be detected. Using this system, even methylated cytosines can be distinguished from the four standard DNA bases (Clarke et al. 2009; Branton et al. 2008).
The company Oxford Nanopore Technologies (ONT) constructed a series of sequencing devices based on this technique, which are available (MinION) or currently entering the market (PromethION, SmidgeION). The biggest problem of the technique is the immense speed by which the DNA strand is processed through the nanopore. This leads to a decrease of resolution when detecting nucleotides in the channel of the pore. Currently ONT is developing two different systems for DNA sequencing: strand exonuclease and strand sequencing (Clarke et al. 2009), of which only for the latter sequencing data is available while writing this chapter.
In strand sequencing, double-stranded DNA is ratcheted through the nanopore by a «motor protein», a process by which it becomes single stranded. For library preparation, sheared DNA is end-repaired, and a hairpin adaptor is ligated to one end of the molecule, while the motor protein is ligated to the other (Goodwin et al. 2015). During sequencing, one strand is passing the pore, followed by the hairpin adaptor and the other strand. If both strands are sequenced, consensus sequences of the two complementary strands are produced, which are termed 2D reads. Due to the speed of this process, multiple bases are present in the pore at a time, and a collection of overlapping k-mers (usually 5-mers, so consecutive five nucleotide fragments) are recorded as signal. Base calling, which is conducted by using the software MinKNOW as a cloud application, needs to distinguish 45 (1024) possible ionic current states for all possible 5-mers. Not surprisingly, a high error rate is reported for all reads produced by this technique so far, averaging around 12% (Ip et al. 2015). However, several strategies for error correction are available (Loman et al. 2015; Goodwin et al. 2015), and also development of new sequencing chemistries already improved the quality of reads dramatically. The development of this technique progresses so fast that numbers mentioned in this chapter will likely be already outdated when printed. ONT developed several sequencing platforms using this technique; however, while writing this chapter in winter 2016, only data for the MinION nanopore was available. An early release of this device to selected laboratories was announced in 2013 (MinION access program) and facilitated early 2014. Since 2015, the MinION miniature DNA sequencer in the size of an MP3 player (◘ Fig. 3.6) is commercially available. A starter pack including the sequencing device, two flow cells and a library preparation kit can be purchased for $1000.
A332029_1_En_3_Fig6_HTML.gif
Fig. 3.6
The MinION sequencing device from Oxford Nanopore Technologies (Picture reprinted with permission of Oxford Nanopore Technologies)
The MinION is equipped with 512 channels with four nanopores each, each of them detecting 50 to 250 bp per second depending on run mode and chemistry. With the R7 chemistry, an output ranging from 90 to 490 mbp per 48 h is reported, with average read lengths around 6 kbp and maximum read lengths of up to 150 kbp (Ashton et al. 2015; Quick et al. 2014; Goodwin et al. 2015). Initial numbers for the currently distributed R9 chemistry are higher (Istace et al. 2016). Using MinION long reads, it was possible to assemble a complete Escherichia coli genome and, in a hybrid assembly with Illumina short reads, the yeast genome (Goodwin et al. 2015; Loman et al. 2015). With the genome of the nematode Caenorhabditis elegans, the first animal genome has been sequenced using MinION nanopore long reads only (Tyson et al. 2017). A system with higher output (PromethION) is currently delivered to selected laboratories by ONT. Moreover, further developments exploring other biological or focussing on synthetic nanopores are under investigation (Feng et al. 2015; Wang et al. 2014). If the still rather high error rate can be decreased in future updates, nanopore sequencing might challenge PacBio’s status as the gold standard for whole-genome sequencing. Due to the speed and the quite simple library preparation, real-time monitoring in metagenomic frameworks and sequencing in the field are possible. For example, Ebola virus surveillance in the field using nanopore sequencing during an outbreak in Western Africa has been demonstrated (Quick et al. 2016). Moreover, methods like «real-time selective sequencing» will truly help to exploit the power of real-time sequencing (Loose et al. 2016). Currently, we just see the potential of this technique unravelling.

3.7 Comparison of Sequencing Platforms

A broad array of different sequencing techniques became commercially distributed in the last decade, revolutionizing the field of evolutionary genomics. Besides the techniques discussed here, some other platforms are available (e.g. SOLiD, Helicos) (Bowers et al. 2009; Valouev et al. 2008), which are less used in phylogenomic studies and have or will have the greatest potential of applications in clinical studies screening nucleotide polymorphisms, ChIP-seq (chromatin immunoprecipitation DNA sequencing) or resequencing genomes. In 2015, the Beijing Genomics Institute released its sequencing platform called BGISEQ-500, which comes close to the output of Illumina’s HiSeq platforms (Goodwin et al. 2016). Several new approaches for DNA sequencing are under development, which are still years away from being commercially available, e.g. transmission electron microscopy DNA sequencing (Bell et al. 2012). The impact of the new sequencing techniques and how it transformed the field of genomics can be most easily seen in the dramatic decrease sequencing costs. Starting with the year 2000, the price per raw megabase of DNA sequencing decreased the first 7 years of this century in line with Moore’s law (Moore 1965). This basically means that the number of sequence data to be generated by a fixed price should double exponentially approximately every 2 years (Mardis 2008). Starting in 2007, with the arrival of newly available sequence platforms (454, Illumina), the costs per raw megabase dramatically decreased (◘ Fig. 3.7), basically allowing small laboratories the access to genome and transcriptome sequencing. The development of these new techniques made the $1000 human genome became reality (► see Infobox 3.1).
A332029_1_En_3_Fig7_HTML.gif
Fig. 3.7
Decrease in sequencing costs per raw megabase of DNA sequence over the last 15 years (Reprinted from: Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) Available at: ► www.​genome.​gov/​sequencingcostsd​ata. Accessed January 2017)
Sanger sequencing, which is still the gold standard in terms of read quality, is now also known as the first generation of sequencing. Second-generation sequencing platforms are 454, Illumina and Ion Torrent, all of them massively parallelized for high-throughput data generation, but restricted to short-read lengths. The newly launched nanopore sequencers and PacBio’s SMRT sequencing are the third generation of sequencers, which have less output than second-generation machines, but are capable of single-molecule sequencing which in parallel also allows the detection of epigenetic modifications. While the read lengths of these machines are much higher than for all other available sequencing techniques, they are still error prone. However, the development of refined sequencing chemistry and technical updates of sequencing machines stipulate hope for higher-quality data in the near future.
With the availability of this number of different sequencing platforms varying in costs, quality and output (◘ Fig. 3.8), it becomes more difficult to strategically decide which technique should be used in planning phylogenomic studies and, if possible, which sequencers should be purchased by laboratories working in the field of evolution. By far, the highest output is generated by Illumina’s HiSeq platforms. The acquisition of a HiSeq is expensive, and these machines can be usually only fully exploited by sequencing centres or very large laboratories. The same is true for PacBio systems. Illumina’s MiSeq and Ion Torrent’s PGM are affordable for smaller labs. However, the price per base cost for these machines is usually much higher than Illumina’s HiSeq (◘ Fig. 3.8). Nevertheless, these machines are well suited for targeted sequencing strategies or sequencing of complete prokaryote genomes. Genome project dealing with eukaryote genomes should envisage a hybrid strategy combining a high coverage of short-read data from second-generation sequencers (e.g. Illumina) and low coverage of long reads from the third generation (e.g. PacBio, Nanopore). Assemblies solely based on long read might result in the highest quality, but could still be too expensive for most projects.
A332029_1_En_3_Fig8_HTML.gif
Fig. 3.8
Output (in bp), runtime (in days) and costs (in $) of different sequencing platforms discussed in this chapter. The cost of resequencing a human genome with 30× coverage is indicated (hg-30×) (Reprinted with permission of Albert Vilella, ► http://​twitter.​com/​albertvilella)

3.7.1 Infobox 3.1The $1000 Genome

Deciphering the human genome took an international collaboration more than a decade, and the costs were estimated to be around 3 billion US dollars when announced in 2001. It became clear that sequencing costs have to be reduced dramatically if subsequent studies targeting haplotype diversity across humans and the use of genomic analysis for routine medical applications should be realized. Discussions about this topic in an expert round delivered the catch phrase that the «$1000 genome» should be targeted. When announced in the early 2000s, this claim seemed utopic and was firstly revised to target the «$100,000 genome», which still would be a more than 100-fold decrease in sequencing costs. It is important to keep in mind that a 100-fold decrease in costs would allow a 100-fold increase in data for the same price – adding much needed statistical power for many desired studies targeting the genetic background of diseases. To achieve these goals, the US-based National Human Genome Research Institute (NHGRI) launched programs actively funding sequencing technology developments. Several sequencing centres and university start-up companies greatly benefited from this rich source of funding. With 454 pyrosequencing, the first of the many NGS techniques to follow became available while directly being supported by these programs. Using this technique, the complete genome sequence of James Watson – the scientist who was directly involved in the discovery of the DNA double helix – was sequenced in less than 2 months for well under 1 million US dollar (Wolinsky 2007; Wheeler et al. 2008). The advent and development of Illumina sequencing again strongly decreased the costs. Already in 2013, costs around $5000 for a 30× coverage human genome with Illumina short reads were estimated by the NHGRI. The launch of Illumina’s HiSeq ×10 system, which is an array of ten sequencers with massive output, finally achieved the goal to sequence human genomes for less than $1000 in early 2014. In January 2017, Illumina announced that with the NovaSeq sequencing platform, it might be possible in the near future to sequence a human genome for $100. However, there is no real consensus how to calculate these costs, as to the pure costs for sequencing, additional costs for personal, electricity and analysis should be added. Whereas genome sequencing became extremely cheap, costs for analysing all these data remain high as highly trained scientists have to do this step. Or as Elaine Mardis phrased it famously, «The $1000 genome, the $100,000 analysis?» (Mardis 2010).
References
Ashton PM, Nair S, Dallman T, Rubino S, Rabsch W, Mwaigwisya S, Wain J, O’Grady J (2015) MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat Biotechnol 33:296–300CrossRefPubMed
Bell DC, Thomas WK, Murtagh KM, Dionne CA, Graham AC, Anderson JE, Glover WR (2012) DNA base identification by electron microscopy. Microsc Microanal 18:1049–1053CrossRefPubMed
Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IMJ, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DMD, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E, Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ, James T, Huw Jones TA, Kang G-D, Kerelska TH, Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM, Martin JW, PG MC, McNitt P, Mehta P, Moon KW, Mullens JW, Newington T, Ning Z, Ling Ng B, Novo SM, O’Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL, Pickering L, Pike AC, Pike AC, Chris Pinkard D, Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N, Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA, Ernest Sohna Sohna J, Spence EJ, Stevens K, Sutton N, Szajkowski L, Tregidgo CL, Turcatti G, Vandevondele S, Verhovsky Y, Virk SM, Wakelin S, Walcott GC, Wang J, Worsley GJ, Yan J, Yau L, Zuerlein M, Rogers J, Mullikin JC, Hurles ME, NJ MC, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R, Smith AJ (2008) Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456:53–59CrossRefPubMedPubMedCentral
Bowers J, Mitchell J, Beer E, Buzby PR, Causey M, Efcavitch JW, Jarosz M, Krzymanska-Olejnik E, Kung L, Lipson D, Lowman GM, Marappan S, McInerney P, Platt A, Roy A, Siddiqi SM, Steinmann K, Thompson JF (2009) Virtual terminator nucleotides for next-generation DNA sequencing. Nat Methods 6:593–595CrossRefPubMedPubMedCentral
Branton D, Deamer DW, Marziali A, Bayley H, Benner SA, Butler T, Di Ventra M, Garaj S, Hibbs A, Huang X, Jovanovich SB, Krstic PS, Lindsay S, Ling XS, Mastrangelo CH, Meller A, Oliver JS, Pershin YV, Ramsey JM, Riehn R, Soni GV, Tabard-Cossa V, Wanunu M, Wiggin M, Schloss JA (2008) The potential and challenges of nanopore sequencing. Nat Biotechnol 26:1146–1153CrossRefPubMedPubMedCentral
Brenner S (2014) Frederick Sanger (1918–2013). Sci 343:262CrossRef
Clarke J, Wu H-C, Jayasinghe L, Patel A, Reid S, Bayley H (2009) Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol 4:265–270CrossRefPubMed
Coupland P, Chandra T, Quail M, Reik W, Swerdlow H (2012) Direct sequencing of small genomes on the Pacific Biosciences RS without library preparation. BioTechniques 53:365–372CrossRefPubMed
Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, deWinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T, Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao P, Zhong F, Korlach J, Turner S (2009) Real-time DNA sequencing from single polymerase molecules. Sci 323:133–138CrossRef
Feng Y, Zhang Y, Ying C, Wang D, Du C (2015) Nanopore-based fourth-generation DNA sequencing technology. Genomics Proteomics Bioinformatics 13:4–16CrossRefPubMedPubMedCentral
Flusberg BA, Webster DR, Lee JH, Travers KJ, Olivares EC, Clark TA, Korlach J, Turner SW (2010) Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods 7:461–465CrossRefPubMedPubMedCentral
Gilles A, Meglecz E, Pech N, Ferreira S, Malausa T, Martin J-F (2011) Accuracy and quality assessment of 454 GS-FLX titanium pyrosequencing. BMC Genomics 12:245CrossRefPubMedPubMedCentral
Goodwin S, Gurtowski J, Ethe-Sayers S, Deshpande P, Schatz MC, McCombie WR (2015) Oxford nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome. Genome Res 25:1750–1756CrossRefPubMedPubMedCentral
Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17:333–351CrossRefPubMed
Gordon D, Huddleston J, Chaisson MJP, Hill CM, Kronenberg ZN, Munson KM, Malig M, Raja A, Fiddes I, Hillier LW, Dunn C, Baker C, Armstrong J, Diekhans M, Paten B, Shendure J, Wilson RK, Haussler D, Chin C-S, Eichler EE (2016) Long-read sequence assembly of the gorilla genome. Science 352:aae0344CrossRefPubMedPubMedCentral
Hackl T, Hedrich R, Schultz J, Förster F (2014) Proovread: large-scale high-accuracy PacBio correction through iterative short read consensus. Bioinformatics 30:3004–3011CrossRefPubMedPubMedCentral
Huse S, Huber J, Morrison H, Sogin M, Welch D (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8:R143CrossRefPubMedPubMedCentral
Ip C, Loose M, Tyson J, de Cesare M, Brown B, Jain M, Leggett R, Eccles D, Zalunin V, Urban J, Piazza P, Bowden R, Paten B, Mwaigwisya S, Batty E, Simpson J, Snutch T, Birney E, Buck D, Goodwin S, Jansen H, O'Grady J, Olsen H, null n (2015) MinION Analysis and Reference Consortium: Phase 1 data release and analysis [version 1; referees: 2 approved]. F1000Res 4:1075
Istace B, Friedrich A, d’Agata L, Faye S, Payen E, Beluche O, Caradec C, Davidas S, Cruaud C, Liti G, Lemainque A, Engelen S, Wincker P, Schacherer J, Aury J-M (2016) de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer. bioRxiv. doi.​org/​10.​1101/​066613
Kasianowicz JJ, Brandin E, Branton D, Deamer DW (1996) Characterization of individual polynucleotide molecules using a membrane channel. Proc Natl Acad Sci U S A 93:13770–13773CrossRefPubMedPubMedCentral
Kircher M, Kelso J (2010) High-throughput DNA sequencing – concepts and limitations. BioEssays 32:524–536CrossRefPubMed
Koren S, Phillippy AM (2015) One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. Curr Opin Microbiol 23:110–120CrossRefPubMed
Lee H, Gurtowski J, Yoo S, Marcus S, McCombie WR, Schatz M (2014) Error correction and assembly complexity of single molecule sequencing reads. bioRxiv:006395
Liu L, Li Y, Li S, Hu N, He Y, Pong R, Lin D, Lu L, Law M (2012) Comparison of next-generation sequencing systems. J Biomed Biotechnol 2012:251364PubMedPubMedCentral
Loman NJ, Quick J, Simpson JT (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat Methods 12:733–735CrossRefPubMed
Loose M, Malla S, Stout M (2016) Real-time selective sequencing using nanopore technology. Nat Methods 13:751–754CrossRefPubMedPubMedCentral
Mardis ER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24:133–141CrossRefPubMed
Mardis E (2010) The $1,000 genome, the $100,000 analysis? Genome Med 2:84CrossRefPubMedPubMedCentral
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen Y-J, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT, Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu P, Begley RF, Rothberg JM (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–380PubMedPubMedCentral
Maxam AM, Gilbert W (1977) A new method for sequencing DNA. Proc Natl Acad Sci U S A 74:560–564CrossRefPubMedPubMedCentral
Mellmann A, Harmsen D, Cummings CA, Zentz EB, Leopold SR, Rico A, Prior K, Szczepanowski R, Ji Y, Zhang W, McLaughlin SF, Henkhaus JK, Leopold B, Bielaszewska M, Prager R, Brzoska PM, Moore RL, Guenther S, Rothberg JM, Karch H (2011) Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS One 6:e22751CrossRefPubMedPubMedCentral
Merriman B, D Team IT, Rothberg JM (2012) Progress in ion torrent semiconductor chip based sequencing. Electrophoresis 33:3397–3417CrossRefPubMed
Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11:31–46CrossRefPubMed
Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc 2010:pdb.prot5448CrossRefPubMed
Moore G (1965) Cramming more components onto integrated circuits. Electrodiagn Ther 38:114–117
Petrosino JF, Highlander S, Luna RA, Gibbs RA, Versalovic J (2009) Metagenomic pyrosequencing and microbial identification. Clin Chem 55:856–866CrossRefPubMedPubMedCentral
Prober J, Trainor G, Dam R, Hobbs F, Robertson C, Zagursky R, Cocuzza A, Jensen M, Baumeister K (1987) A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238:336–341CrossRefPubMed
Quail M, Smith M, Coupland P, Otto T, Harris S, Connor T, Bertoni A, Swerdlow H, Gu Y (2012) A tale of three next generation sequencing platforms: comparison of ion torrent, Pacific biosciences and Illumina MiSeq sequencers. BMC Genomics 13:341CrossRefPubMedPubMedCentral
Quick J, Quinlan A, Loman N (2014) A reference bacterial genome dataset generated on the MinION portable single-molecule nanopore sequencer. Gigascience 3:22CrossRefPubMedPubMedCentral
Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, Cowley L, Bore JA, Koundouno R, Dudas G, Mikhail A, Ouédraogo N, Afrough B, Bah A, Baum JHJ, Becker-Ziaja B, Boettcher JP, Cabeza-Cabrerizo M, Camino-Sánchez Á, Carter LL, Doerrbecker J, Enkirch T, Dorival IG, Hetzelt N, Hinzmann J, Holm T, Kafetzopoulou LE, Koropogui M, Kosgey A, Kuisma E, Logue CH, Mazzarelli A, Meisel S, Mertens M, Michel J, Ngabo D, Nitzsche K, Pallasch E, Patrono LV, Portmann J, Repits JG, Rickett NY, Sachse A, Singethan K, Vitoriano I, Yemanaberhan RL, Zekeng EG, Racine T, Bello A, Sall AA, Faye O, Faye O, Magassouba NF, Williams CV, Amburgey V, Winona L, Davis E, Gerlach J, Washington F, Monteil V, Jourdain M, Bererd M, Camara A, Somlare H, Camara A, Gerard M, Bado G, Baillet B, Delaune D, Nebie KY, Diarra A, Savane Y, Pallawo RB, Gutierrez GJ, Milhano N, Roger I, Williams CJ, Yattara F, Lewandowski K, Taylor J, Rachwal P, Turner DJ, Pollakis G, Hiscox JA, Matthews DA, MKO S, Johnston AM, Wilson D, Hutley E, Smit E, Di Caro A, Wölfel R, Stoecker K, Fleischmann E, Gabriel M, Weller SA, Koivogui L, Diallo B, Keïta S, Rambaut A, Formenty P, Günther S, Carroll MW (2016) Real-time, portable genome sequencing for Ebola surveillance. Nature 530:228–232CrossRefPubMedPubMedCentral
Ronaghi M (2001) Pyrosequencing sheds light on DNA sequencing. Genome Res 11:3–11CrossRefPubMed
Ronaghi M, Karamohamed S, Pettersson B, Uhlén M, Nyrén P (1996) Real-time DNA sequencing using detection of pyrophosphate release. Anal Biochem 242:84–89CrossRefPubMed
Ross M, Russ C, Costello M, Hollinger A, Lennon N, Hegarty R, Nusbaum C, Jaffe D (2013) Characterizing and measuring bias in sequence data. Genome Biol 14:R51CrossRefPubMedPubMedCentral
Rothberg JM, Leamon JH (2008) The development and impact of 454 sequencing. Nat Biotechnol 26:1117–1124CrossRefPubMed
Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, Leamon JH, Johnson K, Milgrew MJ, Edwards M, Hoon J, Simons JF, Marran D, Myers JW, Davidson JF, Branting A, Nobile JR, Puc BP, Light D, Clark TA, Huber M, Branciforte JT, Stoner IB, Cawley SE, Lyons M, Fu Y, Homer N, Sedova M, Miao X, Reed B, Sabina J, Feierstein E, Schorn M, Alanjary M, Dimalanta E, Dressman D, Kasinskas R, Sokolsky T, Fidanza JA, Namsaraev E, McKernan KJ, Williams A, Roth GT, Bustillo J (2011) An integrated semiconductor device enabling non-optical genome sequencing. Nature 475:348–352CrossRefPubMed
Sakurai T, Husimi Y (1992) Real-time monitoring of DNA polymerase reactions by a micro ISFET pH sensor. Anal Chem 64:1996–1997CrossRefPubMed
Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 74:5463–5467CrossRefPubMedPubMedCentral
Tilgner H, Grubert F, Sharon D, Snyder MP (2014) Defining a personal, allele-specific, and single-molecule long-read transcriptome. Proc Natl Acad Sci U S A 111:9869–9874CrossRefPubMedPubMedCentral
Travers KJ, Chin C-S, Rank DR, Eid JS, Turner SW (2010) A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res 38:e159CrossRefPubMedPubMedCentral
Tyson JR, O’Neil NJ, Jain M, Olsen HE, Hieter P, Snutch TP (2017) Whole genome sequencing and assembly of a Caenorhabditis elegans genome with complex genomic rearrangements using the MinION sequencing device. bioRxiv. doi.​org/​10.​1101/​099143
Valouev A, Ichikawa J, Tonthat T, Stuart J, Ranade S, Peckham H, Zeng K, Malek JA, Costa G, McKernan K, Sidow A, Fire A, Johnson SM (2008) A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res 18:1051–1063CrossRefPubMedPubMedCentral
Wang Y, Yang Q, Wang Z (2014) The evolution of nanopore sequencing. Front Genet 5:449PubMed
Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen Y-J, Makhijani V, Roth GT, Gomes X, Tartaro K, Niazi F, Turcotte CL, Irzyk GP, Lupski JR, Chinault C, X-z S, Liu Y, Yuan Y, Nazareth L, Qin X, Muzny DM, Margulies M, Weinstock GM, Gibbs RA, Rothberg JM (2008) The complete genome of an individual by massively parallel DNA sequencing. Nature 452:872–876CrossRefPubMed
Wolinsky H (2007) The thousand-dollar genome. EMBO Rep 8:900–903CrossRefPubMedPubMedCentral
Yu DW, Ji Y, Emerson BC, Wang X, Ye C, Yang C, Ding Z (2012) Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring. Methods Ecol Evol 3:613–623CrossRef