-
Sanger sequencing is based on the chain termination method which relies on separating DNA by size and the incorporation of labelled modified nucleotides.
-
454 pyrosequencing measures the amount of light produced by the incorporation of nucleotides in a cascade of enzymatic reactions under the presence of a luciferase.
-
Reversible terminator sequencing is a sequencing-by-synthesis approach where the incorporation of modified nucleotides is detected stepwise.
-
Ion semiconductor sequencing analyses changes of hydrogen ion concentration during the incorporation of nucleotides into the DNA strand.
-
Single-molecule real-time (SMRT) sequencing monitors without interruption the incorporation of differently fluorescent-tagged nucleotides by the polymerase activity.
-
Nanopore sequencing detects the identity of nucleotides within the DNA strand while it is passing through a nanopore.
-
The availability of next-generation sequencing (NGS) platforms transformed the field of genomics and led to a dramatic decrease in sequencing costs.
3.1 Sanger Sequencing
Two DNA sequencing techniques were
developed in the mid-1970s. Allan Maxam and Walter Gilbert proposed
a chemical cleavage method, which was initially widely used around
molecular laboratories (Maxam and Gilbert 1977). Around the same time, Frederick Sanger
and colleagues developed a chain termination method (Sanger et al.
1977). After the chemistry needed
for this method became commercially available, Sanger sequencing
got the standard sequencing technique for all applications. For
nearly three decades, Sanger sequencing was synonym to DNA
sequencing. DNA sequencing revolutionized many fields of biological
and medical sciences, and fittingly Frederick Sanger and Walter
Gilbert were awarded with a Nobel Prize in Chemistry in 1980, which
they shared with Paul Berg (Brenner 2014).
Sanger’s chain termination method is
basically based on two principles: (I) DNA can be separated by size
and (II) DNA polymerases is able to incorporate modified
nucleotides. As DNA is negatively charged, gel electrophoresis
allows the separation of DNA strands by size as larger DNA strands
migrate slower to a positive electrode. By using polyacrylamide
gels, it is even possible to detect minute differences of single
base pairs between two different DNA strands. When separating DNA
into single strands, it is possible to replicate one strand by the
use of DNA polymerase II which will add one nucleotide after
another complementary to the template DNA strand. Requirements are
a DNA primer (oligonucleotide) that fits to the template and the
availability of nucleotides (dNTPs) for incorporation. Nucleotides
bear a 3′ OH group and a 5′ phosphate group which will be connected
by the polymerase while removing two of the phosphates of the 5′
end. However, if the 3′ OH group is missing, it is impossible to
connect another nucleotide, and the elongation of the template
strand is terminated. Sanger used exactly such modified
nucleotides, which have an H at the 3′ end instead of the OH group
(dideoxynucleotides, ddNTPs) for his sequencing reaction. The
sequencing reaction mix is composed of DNA polymerase, a primer for
the target region, and a mix of dNTPs and ddNTPs. The ratio of
dNTPs/ddNTPs is usually around 100 to 1, so that termination could
be obtained at least once for every position of the DNA template
during the amplification process. Whenever a ddNTP is incorporated,
the elongation of the template DNA stops, thereby generating DNA
molecules of different sizes. Using electrophoresis, these
differently sized DNA molecules can be separated, and the identity
of the incorporated ddNTP will allow reconstructing the nucleotide
sequence (◘ Fig. 3.1). Initially, ddNTPs were labelled
radioactively, and reactions were performed for each of the four
bases separately. Huge gels had to be inspected by the eye to
reconstruct the sequence order, a reason why DNA output from
sequencers is still known as reads. Later on, unique fluorescent
labels for all four different bases (adenosine, thymine, guanine
and cytosine) were introduced which could be detected by a laser
while migrating through the gel. This innovation strongly decreased
analysis time, as all reactions could run at the same time and
increased the accuracy of base detection (Prober et al.
1987). Computer-based analyses
allow that the detected fluorescence will be automatically
associated with the corresponding nucleotide to generate a
chromatogram for every sequence read (◘ Fig. 3.1).
Fig.
3.1
Chromatogram of a Sanger sequencing run.
All four bases are labelled differently and are separated according
to their size
Current machines for high-throughput
analyses like the ABI 3730xl are equipped with 96 capillaries. The
time for a single sequencing run will take, according to the
envisaged quality, 2–3 h, with up to 1000 bp sequence reads of high
quality. The output for a single run will be around 100 Kb and
during 24 h sequencing of 1 Mb might be realistic (Liu et al.
2012). After trimming the ends of
sequencing reads, an accuracy of 99.999% can be achieved, and
sequence errors are mainly due to errors in preceding amplification
steps (Kircher and Kelso 2010). As
such, Sanger sequencing remains a good option for high-quality
sequence reads. The first human genome sequence has been sequenced
with this technique.
3.2 454 Pyrosequencing
The first next-generation sequencing
(NGS) technique that was commercially available has been 454
sequencing. This technique presented solutions for each of the
bottlenecks typically found in large-scale classical Sanger
sequencing: library preparation, template preparation and
sequencing itself (Rothberg and Leamon 2008). The principle for 454 sequencing goes
back to a real-time sequencing approach called pyrosequencing
developed by Ronaghi et al. (1996), and a highly parallelized high-throughput
automatization was presented by Margulies et al. (2005). The great advantage of this method
compared to Sanger sequencing is that sequences are read out while
being synthesized, therefore omitting electrophoresis steps for
size separation of DNA fragments. In this approach, nucleotides are
released one after another and washed over the template DNA strand.
A cascade of enzymatic reactions leads to the emission of
detectable light signal which is in its strength proportional to
the number of nucleotides incorporated during this step. For
example, if C’s are washed over the template DNA, the polymerase
incorporates as much consecutive C’s as are present in the target
sequence. When being incorporated by the polymerase, a
pyrophosphate is released for each nucleotide, which is
subsequently converted to ATP by ATP sulphurylase (◘ Fig.
3.2). Present
luciferases can use this energy to oxidize luciferin, which leads
to the generation of light (Ronaghi 2001). After detection of the signal,
superfluous nucleotides are removed in a washing step, and the next
nucleotide is provided in a subsequent flow cycle.
Fig.
3.2
Principles of 454 pyrosequencing in
picotiter well plates. The dNTP cytosine is flushed over the wells
containing beads carrying DNA fragments and reaction mix.
Incorporation of a dNTP by the polymerase leads to the release of a
pyrophosphate, which is converted to ATP by ATP sulphurylase.
Present luciferases use this energy to oxidize luciferin, which
leads to the generation of light (Reprinted by permission from
Macmillan Publishers Ltd: Nature
Review Genetics (Metzker 2010), copyright 2009)
For the preparation of sequencing
libraries, DNA is linked with special adaptors by sequential
ligation and subsequently separated into single strands. One of
these adaptors allows the binding to beads by hybridization. During
a process called emulsion PCR, library fragments are loaded onto
beads and amplified. To do this, a water mixture containing the
capture beads, sequencing libraries and PCR reagents are mixed with
synthetic oil in a plastic vessel. Vigorous shaking leads to the
formation of droplets around the beads which by chance will usually
contain a single DNA library fragment. As each droplet contains
also all PCR reagents, an amplification of the library fragment
will produce millions of copies which bind to the capture bead.
After finishing the PCR, beads will be cleaned from the oil, and
those which do not carry DNA are removed. A key to high-throughput
sequencing with this method is the use of picotiter well plates for
sequencing (◘ Fig. 3.2). These plates bear approximately 1.6
million wells with a volume of 75 picoliters (Margulies et al.
2005). These wells are designed to
exactly fit a single bead which carries the DNA fragments to be
sequenced. Each well is filled with a capture bead and smaller
beads carrying enzymes like ATP sulphurylase and luciferases. As
described above, free nucleotides are washed over the well plate,
and light emission is detected by a high-resolution camera. The
number of filled wells determines the number of sequences to be
generated in a single run.
Sequencers of Roche’s 454 GS FLX
series can produce around 1,000,000 sequences with an average read
length of 700 bp and reads up to 1000 bp. A single run will take
around 24 h, with an output of 700 Mb altogether. Four hundred
fifty-four sequences are of high quality with an accuracy of
~99.75% after removing reads with N’s (Huse et al. 2007). Sequencing errors are mainly indels
(insertion and deletions), often found in stretches of homopolymers
(Gilles et al. 2011). Due to the
considerably low output in comparison to other methods, 454
sequencing lost its relevance for genome projects. However,
especially for high-throughput metabarcoding studies based on
amplicon sequencing, this technique is still frequently used (Yu et
al. 2012; Petrosino et al.
2009). However, Roche decided to
stop supporting the 454 platform in 2016.
3.3 Reversible Terminator Sequencing (Illumina)
The by far most widely used NGS
platform is Illumina sequencing. Also known under its former
company name as Solexa sequencing, this technique is based on
cyclic reversible terminator technology. The sequencing reaction
takes place on a flow cell, where literally billions of sequences
can be processed during a single run. Based on a
sequencing-by-synthesis approach, all four nucleotides are added
simultaneously to the flow cell, together with a polymerase.
Similar to Sanger sequencing, the PCR reaction is stopped after
incorporating a modified base. Every incorporated nucleotide is
chemically blocked at its 3′ OH group and carries a removable
fluorophore which can be identified by laser. Most Illumina
sequencers (GA II, HiSeq, MiSeq) use a system with four colours,
one for each base. Based on four different images coming from four
different colour channels, bases are called for each sequence. The
Illumina NextSeq system uses only two channels for base detection.
Here, red refers to a C and green to T, mixed signals (red and
green) are interpreted as A and the missing of a dye refers to a G.
As only two images are needed for base calling, the detection
becomes faster. After detection, the blocking and the fluorophore
are removed, and the process of sequencing is continued by
incorporating the next nucleotide (Bentley et al. 2008). In contrast to Sanger sequencing, all
incorporated nucleotides terminate the elongation process and carry
fluorophores.
For library preparation, two different
adaptors (P5 and P7) are ligated to the ends of all DNA molecules
(◘ Fig. 3.3a).
In an additional step, indices can be added by an indexing PCR
creating unique libraries, which allows pooling during sequencing.
In contrast to other barcoding approaches, the indexes are placed
within the adaptors and not at the ends of the molecules to be
sequenced (Meyer and Kircher 2010). The resulting library is amplified with
longer primers which further extend the adaptor sequences. In the
next steps, the double-stranded library will be separated into
single strands which are pumped into the flow cell. Two different
types of short oligonucleotides which are complementary to the ends
of the library adaptors are distributed as anchors across the flow
cell. These anchors can hybridize to the end of adaptors and by
adding nucleotides and a polymerase. Single-stranded molecules are
copied starting from the hybridized anchor region. The newly
synthesized double-stranded molecule is denatured, and the original
template DNA strand is washed away. As a result, all newly
synthesized strands are covalently attached to the flow cell. In a
step called bridge amplification, the single-stranded strands start
to bend over to hybridize at their end with adjacent free anchor
oligonucleotides (◘ Fig. 3.3b). Again, the hybridized primer is extended
by polymerase, which leads to the formation of a double-stranded
bridge. This bridge is denatured, resulting in two copies of
covalently bound single-stranded DNA templates. The bridge
amplification step is repeated several times until cluster of some
thousand copies are generated. In the end, the bridges are again
denatured, and all reversed strands are cleaved and washed away.
After blocking of the free 3′ ends, the sequencing as described
above begins. While sequencing, the number of cycles will determine
the number of nucleotides to be sequenced. With current machines
and chemistry, usually 96–250 cycles are used to produce sequences
in according length. It is also possible to use paired-end (PE)
sequencing, which means that both ends of a single molecule from
the sequencing library will be determined. In this case, the
blocking of the free sequence ends has to be removed, and a new
bridge is formed due to bending of the sequence ends to
corresponding adjacent anchors (◘ Fig. 3.3c). The single strand
is replicated using a polymerase, and afterwards double strands are
separated again. This time, all original forward strands are
cleaved and washed away, and after blocking free ends, sequencing
begins again.
Fig.
3.3
Illumina sequencing library preparation and
bridge amplification. a For
library preparation, two different adaptors are ligated to the end
of sheared DNA fragments. b
The sequencing library is pumped into the flow cell and can bind to
short oligonucleotides on its surface, which are complementary to
adaptor sequences. Single-stranded molecules are copied starting
from the hybridized anchor region. Newly synthesized
double-stranded molecules are denatured, and the original template
DNA strand is washed away. Single-stranded strands start to bend
over to hybridize at their end with adjacent free anchor
oligonucleotides, thereby building a bridge. Multiple PCR cycles
are used to generate clusters of clonal sequences. c To generate single-stranded templates
for sequencing by synthesis, DNA fragments are linearized by
cleavage within one adaptor sequence and subsequently denatured.
For paired-end sequencing, the DNA template forms a bridge, and the
second strand of the DNA fragment is synthesized. This time,
cleavage will take place in the opposite adaptor region to provide
a template for sequencing (Reprinted by permission from Macmillan
Publishers Ltd: Nature
(Bentley et al. 2008), copyright
2008)
Illumina sequencing can currently
generate the biggest output of sequence data, and several different
platforms are available. The Illumina HiSeq models are able to
generate around 600 million per lane of a flow cell, which
comprises eight lanes altogether, resulting into nearly 5 billion
sequences for a full run. According to the number of cycles used,
this would produce 750 Gb sequence output for 150 cycles. Including
copying the data, such a run would last around 10 days. Faster
«rapid» runs using flow cells with only two lanes, which also
produce «only» around 66% of the output per lane, are available,
and these can be finished in 24 h. In January 2017, Illumina
announced the release of the NovaSeq models, which will replace the
HiSeq series in the near future. Using NovaSeq, an output of up to
3 Tb in a 40 h run is envisaged. Two Illumina machines are
available as desktop machines with the NextSeq 500 and the MiSeq.
The NextSeq 500 platform runs a single lane with an output to up to
400 million sequences. The Illumina MiSeq is the cheapest model,
and according to the model, the output of a single lane flow cell
is between 5 and 25 million sequences. The accuracy of Illumina
sequencing is with ~99.25% slightly worse than for Sanger or 454
(Quail et al. 2012). However, the
error profile differs from 454 as substitutions are found much more
frequently instead of indels, which is usually easier to handle in
downstream analyses like mapping or assembly.
Illumina sequencing is currently
standard for transcriptome sequencing and the resequencing of
genomes. The read length of earlier chemistry and machines was
limited to between 36 bp and 76 bp. However, with the availability
of longer reads, this technique is now also frequently used in
metabarcoding and metagenomic studies as well.
3.4 Ion Semiconductor Sequencing (Ion Torrent)
The Personal Genome Machine (PGM)
released by Ion Torrent in 2010 was a totally new approach based on
ion semiconductor sequencing (Rothberg et al. 2011). After Ion Torrent was purchased by Life
Technologies, a platform with even higher throughput was released:
the Ion Proton. In difference to Sanger, Illumina and 454, this
technique does not rely on analysing optical signals. Instead,
changes of hydrogen ion concentration are analysed. Anytime when a
nucleotide is incorporated into a DNA strand by polymerase
activity, a hydrogen (or proton) is released (◘ Fig. 3.4). The release of this
proton can be measured in real time by ion-sensitive field-effect
transistors (ISFETs) (Sakurai and Husimi 1992). By using available methodology and
software from modern imaging devices (laptops, digital cameras), an
array has been built for the large-scale use of ISFETs. This
technique is called complementary metal-oxide semiconductor (CMOS)
process (Rothberg et al. 2011).
Every sensor in this array directly monitors the hydrogen ion
release during sequencing. Each chip of the sequencer contains
between 1 and 660 million sensors, which are composed of a well
with an acrylamide bead with a DNA template containing also the
dNTPs. The chip size can be chosen according to the required number
of reads for the sequencing project. As in 454 sequencing, the
wells are flooded in cycles with one sort dNTPs at a time. Below
the well lies a metal-oxide sensing layer, which itself is on top
of a sensor plate and floating metal «gate» for the transmission of
electronic information about the pH changes to the semiconductor
(◘ Fig. 3.4).
The detected changes in pH allow inferring if and how many bases
have been incorporated to a sequence read. Relying on a purely
electronic detection system without any optical components allows a
considerably cheap instrument cost compared to other NGS platforms.
Fig.
3.4
Principle of ion semiconductor sequencing.
A well containing a bead with a DNA fragment is shown.
Incorporation of a nucleotide releases a proton (H+),
which changes the pH in the well. This release changes the
potential in an underlying metal-oxide sensing layer, which is
received by a transistor (Reprinted by permission from Macmillan
Publishers Ltd: Nature
(Rothberg et al. 2011), copyright
2011)
The library preparation is similar to
the 454 technique. Adaptors are ligated to DNA molecules, which are
loaded onto magnetic beads. Molecules are amplified using emulsion
PCRs. Wells are suited to fit one bead, which are loaded on to the
chips by a centrifugation step.
Two different sequence platforms are
available (PGM and Ion Proton), and these can be run with
differently sized chips. Chips for the PGM bear less sensors (PGM
314: 1.2 million; PGM 316: 6.1 million; PGM 318: 11 million), with
an expected output of 500,000 to 5.5 million sequences. With a read
length of up to 400 bp, a single run using the largest chip would
generate an output of ~2 Gb. The runtime is according to the chip
size with 2 to 7 h relatively fast compared with other techniques.
The chips of the Ion Proton system are larger sized (PI, 165
million sensors; PII, 660 million). Using the largest size ~330
million reads with up to 200 bp can be generated in a run which
lasts between 2 and 4 h. This would equal an output of 66.6 Gb.
With ~98%, the accuracy is lower than for 454 and Illumina
platforms (Quail et al. 2012;
Merriman et al. 2012). Similar to
the 454 technique, indels are the prevailing error type. However,
the biggest advantage for this technique is speed. The library
preparation should last less than 6 h, and sequencing runs are
finished in a few hours. Using this approach, it is possible to get
bacterial genomes completed from extracted DNA to assembly in less
than 3 days. This rapid methodology has been proven useful while
monitoring and characterizing bacterial genomes during an
E. coli outbreak in Germany
in spring 2011 (Mellmann et al. 2011).
3.5 Single-Molecule Real-Time (SMRT) Sequencing (PacBio)
The so far discussed sequencing
techniques produce rather sort sequence reads, mostly below 1000
bps. Machines based on a new sequencing method targeting single
molecules are available since 2011 from the company Pacific
Biosciences (PacBio), which are able to produce considerably longer
reads. Here, polymerase activity is monitored without interruption
while incorporating four differently fluorescent-tagged dNTPs (Eid
et al. 2009). These phospho-linked
nucleotides carry a fluorescent label on the phosphate group of the
nucleotide and are cleaved away after incorporation. By using
real-time imaging, incorporated nucleotides are detected while they
are synthesized along a single DNA template molecule. The detection
takes place in a zero-mode waveguide (ZMW) microwell which is a
nanophotonic structure surrounded by aluminium. Each ZMW measures
only 70 nm in diameter and 100 nm in depth, leaving an observation
volume of 20 × 10−21 litres. A single molecule of Φ29
DNA polymerase is attached to the surface of the ZMW, and its
activity can be measured (◘ Fig. 3.5). The small volume of the ZMWs reduces the
amount of background noise due to the presence of
fluorescent-labelled nucleotides. While detecting the level of
fluorescence intensity in a single ZMW, a more or less stable
background level is measured. An association of a phospho-linked
nucleotide with the template DNA in the polymerase active site
triggers a pulse of fluorescence intensity for the corresponding
dye. This light emission lasts some milliseconds, which is recorded
by the detector of the ZMW. The fluorescence label is cleaved by
the DNA polymerase leaving a phosphodiester bond which allows the
elongation of the DNA template. The cleaved dye diffuses, leading
to a drop of the recorded emission intensity back to the background
level. The next nucleotide can be incorporated and the measurement
repeats (◘ Fig. 3.5). The synthesis rate is around two to four
bases per second. In contrast to all other methods described so
far, SMRT sequencing does not interrupt the process of DNA
synthesis. Interestingly, it has been shown that the emission
spectra contain more information besides the nucleotide identity.
The duration of an emission and the interval between successive
emissions also reveal information about nucleotide modifications.
Using this data, a genome-wide mapping of methylation patterns
becomes possible (Flusberg et al. 2010).
Fig.
3.5
Principle of single-molecule real-time
(SMRT) sequencing. a A single
molecule of Φ29 DNA polymerase is attached to the surface of a
zero-mode waveguide (ZMW) microwell, and its activity while copying
a DNA molecule is measured. b
Emission spectra are detected, while fluorescent-labelled
nucleotides are incorporated by the polymerase (Reprinted by
permission of Pacific Biosciences)
High-quality DNA is needed in a high
quantity for the library preparation, as no additional
amplification step is included. Genomic DNA is sheared to the
desired average DNA length, ends are repaired and hairpin adaptors
are ligated to these ends. Hairpin adaptors represent
single-stranded loops to which the sequencing primer can bind. The
construct of the double-stranded template DNA flanked by two
hairpin loops is called SMRTbell (Travers et al. 2010). Using polymerase with strand displacement
activity, a primer binding to the hairpin adaptor can be extended
displacing one DNA strand, while the other is used as a template.
Sequencing has been even facilitated without any library
preparation (Coupland et al. 2012).
Whereas there seems to be no negative effect on the read length,
the output was considerably lower as for standard library
preparation. As no ligated adaptors are present, known primer
regions or random hexamer primers can be used for sequencing.
SMRT sequencing has the advantage that
the sequencing process is with 4 h rather fast. The read length is
long, averaging around 15 Kb, and reads may exceed lengths of 50 Kb
and more (Lee et al. 2014).
Moreover, single molecules are sequenced, and modifications are
detected as additional information. However, a caveat of the
technique is the high error rate. An accuracy of ~80–85% is given
for single pass reads (Hackl et al. 2014). Even though a large fraction of the
sequencing errors seem to stem from deletions and insertions, no
significant sequencing bias has been found (Ross et al.
2013). The output is with 2.8 Gb
sequencing data (PacBio RSII) per day rather low compared with
other NGS techniques. Nevertheless, the availability of long reads
from this technique dramatically increases the quality of genome
and transcriptome assemblies (Koren and Phillippy 2015; Tilgner et al. 2014). Currently, long reads via SMRT sequencing
represent the gold standard for the de novo assembly of genomes, as
a more complete picture of gene content, structural variation and
repeat biology can be achieved (Gordon et al. 2016).
3.6 Nanopore Sequencing
The principle of DNA (and RNA)
sequencing using nanopores was firstly proven back in the mid-1990s
(Kasianowicz et al. 1996). For the
first sequencing experiments, a staphylococcal nanopore α-hemolysin
protein pore was incorporated into a phospholipid bilayer separated
by two reservoirs with a salt solution. By applying an electric
current using electrodes placed on the opposite sides of the
bilayer, negatively charged DNA molecules are forced passing
through a small nanopore channel with a diameter of a few nm.
Nucleotides passing the pore characteristically decrease the
amplitude of the ionic current and can be detected. Using this
system, even methylated cytosines can be distinguished from the
four standard DNA bases (Clarke et al. 2009; Branton et al. 2008).
The company Oxford Nanopore
Technologies (ONT) constructed a series of sequencing devices based
on this technique, which are available (MinION) or currently
entering the market (PromethION, SmidgeION). The biggest problem of
the technique is the immense speed by which the DNA strand is
processed through the nanopore. This leads to a decrease of
resolution when detecting nucleotides in the channel of the pore.
Currently ONT is developing two different systems for DNA
sequencing: strand exonuclease and strand sequencing (Clarke et al.
2009), of which only for the latter
sequencing data is available while writing this chapter.
In strand sequencing, double-stranded
DNA is ratcheted through the nanopore by a «motor protein», a
process by which it becomes single stranded. For library
preparation, sheared DNA is end-repaired, and a hairpin adaptor is
ligated to one end of the molecule, while the motor protein is
ligated to the other (Goodwin et al. 2015). During sequencing, one strand is passing
the pore, followed by the hairpin adaptor and the other strand. If
both strands are sequenced, consensus sequences of the two
complementary strands are produced, which are termed 2D reads. Due
to the speed of this process, multiple bases are present in the
pore at a time, and a collection of overlapping k-mers (usually 5-mers, so consecutive
five nucleotide fragments) are recorded as signal. Base calling,
which is conducted by using the software MinKNOW as a cloud
application, needs to distinguish 45 (1024) possible
ionic current states for all possible 5-mers. Not surprisingly, a
high error rate is reported for all reads produced by this
technique so far, averaging around 12% (Ip et al. 2015). However, several strategies for error
correction are available (Loman et al. 2015; Goodwin et al. 2015), and also development of new sequencing
chemistries already improved the quality of reads dramatically. The
development of this technique progresses so fast that numbers
mentioned in this chapter will likely be already outdated when
printed. ONT developed several sequencing platforms using this
technique; however, while writing this chapter in winter 2016, only
data for the MinION nanopore was available. An early release of
this device to selected laboratories was announced in 2013 (MinION
access program) and facilitated early 2014. Since 2015, the MinION
miniature DNA sequencer in the size of an MP3 player (◘ Fig.
3.6) is
commercially available. A starter pack including the sequencing
device, two flow cells and a library preparation kit can be
purchased for $1000.
Fig.
3.6
The MinION sequencing device from Oxford
Nanopore Technologies (Picture reprinted with permission of Oxford
Nanopore Technologies)
The MinION is equipped with 512
channels with four nanopores each, each of them detecting 50 to 250
bp per second depending on run mode and chemistry. With the R7
chemistry, an output ranging from 90 to 490 mbp per 48 h is
reported, with average read lengths around 6 kbp and maximum read
lengths of up to 150 kbp (Ashton et al. 2015; Quick et al. 2014; Goodwin et al. 2015). Initial numbers for the currently
distributed R9 chemistry are higher (Istace et al. 2016). Using MinION long reads, it was possible
to assemble a complete Escherichia
coli genome and, in a hybrid assembly with Illumina short
reads, the yeast genome (Goodwin et al. 2015; Loman et al. 2015). With the genome of the nematode
Caenorhabditis elegans, the
first animal genome has been sequenced using MinION nanopore long
reads only (Tyson et al. 2017). A
system with higher output (PromethION) is currently delivered to
selected laboratories by ONT. Moreover, further developments
exploring other biological or focussing on synthetic nanopores are
under investigation (Feng et al. 2015; Wang et al. 2014). If the still rather high error rate can
be decreased in future updates, nanopore sequencing might challenge
PacBio’s status as the gold standard for whole-genome sequencing.
Due to the speed and the quite simple library preparation,
real-time monitoring in metagenomic frameworks and sequencing in
the field are possible. For example, Ebola virus surveillance in
the field using nanopore sequencing during an outbreak in Western
Africa has been demonstrated (Quick et al. 2016). Moreover, methods like «real-time
selective sequencing» will truly help to exploit the power of
real-time sequencing (Loose et al. 2016). Currently, we just see the potential of
this technique unravelling.
3.7 Comparison of Sequencing Platforms
A broad array of different sequencing
techniques became commercially distributed in the last decade,
revolutionizing the field of evolutionary genomics. Besides the
techniques discussed here, some other platforms are available (e.g.
SOLiD, Helicos) (Bowers et al. 2009; Valouev et al. 2008), which are less used in phylogenomic
studies and have or will have the greatest potential of
applications in clinical studies screening nucleotide
polymorphisms, ChIP-seq (chromatin immunoprecipitation DNA
sequencing) or resequencing genomes. In 2015, the Beijing Genomics
Institute released its sequencing platform called BGISEQ-500, which
comes close to the output of Illumina’s HiSeq platforms (Goodwin et
al. 2016). Several new approaches
for DNA sequencing are under development, which are still years
away from being commercially available, e.g. transmission electron
microscopy DNA sequencing (Bell et al. 2012). The impact of the new sequencing
techniques and how it transformed the field of genomics can be most
easily seen in the dramatic decrease sequencing costs. Starting
with the year 2000, the price per raw megabase of DNA sequencing
decreased the first 7 years of this century in line with Moore’s
law (Moore 1965). This basically
means that the number of sequence data to be generated by a fixed
price should double exponentially approximately every 2 years
(Mardis 2008). Starting in 2007,
with the arrival of newly available sequence platforms (454,
Illumina), the costs per raw megabase dramatically decreased
(◘ Fig. 3.7),
basically allowing small laboratories the access to genome and
transcriptome sequencing. The development of these new techniques
made the $1000 human genome became reality (► see Infobox
3.1).
Fig.
3.7
Decrease in sequencing costs per raw
megabase of DNA sequence over the last 15 years (Reprinted from:
Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome
Sequencing Program (GSP) Available at: ► www.genome.gov/sequencingcostsdata.
Accessed January 2017)
Sanger sequencing, which is still the
gold standard in terms of read quality, is now also known as the
first generation of sequencing. Second-generation sequencing
platforms are 454, Illumina and Ion Torrent, all of them massively
parallelized for high-throughput data generation, but restricted to
short-read lengths. The newly launched nanopore sequencers and
PacBio’s SMRT sequencing are the third generation of sequencers,
which have less output than second-generation machines, but are
capable of single-molecule sequencing which in parallel also allows
the detection of epigenetic modifications. While the read lengths
of these machines are much higher than for all other available
sequencing techniques, they are still error prone. However, the
development of refined sequencing chemistry and technical updates
of sequencing machines stipulate hope for higher-quality data in
the near future.
With the availability of this number
of different sequencing platforms varying in costs, quality and
output (◘ Fig. 3.8), it becomes more difficult to strategically
decide which technique should be used in planning phylogenomic
studies and, if possible, which sequencers should be purchased by
laboratories working in the field of evolution. By far, the highest
output is generated by Illumina’s HiSeq platforms. The acquisition
of a HiSeq is expensive, and these machines can be usually only
fully exploited by sequencing centres or very large laboratories.
The same is true for PacBio systems. Illumina’s MiSeq and Ion
Torrent’s PGM are affordable for smaller labs. However, the price
per base cost for these machines is usually much higher than
Illumina’s HiSeq (◘ Fig. 3.8). Nevertheless, these machines are well
suited for targeted sequencing strategies or sequencing of complete
prokaryote genomes. Genome project dealing with eukaryote genomes
should envisage a hybrid strategy combining a high coverage of
short-read data from second-generation sequencers (e.g. Illumina)
and low coverage of long reads from the third generation (e.g.
PacBio, Nanopore). Assemblies solely based on long read might
result in the highest quality, but could still be too expensive for
most projects.
Fig.
3.8
Output (in bp), runtime (in days) and costs
(in $) of different sequencing platforms discussed in this chapter.
The cost of resequencing a human genome with 30× coverage is
indicated (hg-30×) (Reprinted with permission of Albert Vilella,
► http://twitter.com/albertvilella)
3.7.1 Infobox 3.1The $1000 Genome
Deciphering the human genome took an
international collaboration more than a decade, and the costs were
estimated to be around 3 billion US dollars when announced in 2001.
It became clear that sequencing costs have to be reduced
dramatically if subsequent studies targeting haplotype diversity
across humans and the use of genomic analysis for routine medical
applications should be realized. Discussions about this topic in an
expert round delivered the catch phrase that the «$1000 genome»
should be targeted. When announced in the early 2000s, this claim
seemed utopic and was firstly revised to target the «$100,000
genome», which still would be a more than 100-fold decrease in
sequencing costs. It is important to keep in mind that a 100-fold
decrease in costs would allow a 100-fold increase in data for the
same price – adding much needed statistical power for many desired
studies targeting the genetic background of diseases. To achieve
these goals, the US-based National Human Genome Research Institute
(NHGRI) launched programs actively funding sequencing technology
developments. Several sequencing centres and university start-up
companies greatly benefited from this rich source of funding. With
454 pyrosequencing, the first of the many NGS techniques to follow
became available while directly being supported by these programs.
Using this technique, the complete genome sequence of James Watson
– the scientist who was directly involved in the discovery of the
DNA double helix – was sequenced in less than 2 months for well
under 1 million US dollar (Wolinsky 2007; Wheeler et al. 2008). The advent and development of Illumina
sequencing again strongly decreased the costs. Already in 2013,
costs around $5000 for a 30× coverage human genome with Illumina
short reads were estimated by the NHGRI. The launch of Illumina’s
HiSeq ×10 system, which is an array of ten sequencers with massive
output, finally achieved the goal to sequence human genomes for
less than $1000 in early 2014. In January 2017, Illumina announced
that with the NovaSeq sequencing platform, it might be possible in
the near future to sequence a human genome for $100. However, there
is no real consensus how to calculate these costs, as to the pure
costs for sequencing, additional costs for personal, electricity
and analysis should be added. Whereas genome sequencing became
extremely cheap, costs for analysing all these data remain high as
highly trained scientists have to do this step. Or as Elaine Mardis
phrased it famously, «The $1000 genome, the $100,000 analysis?»
(Mardis 2010).
References
Bentley DR, Balasubramanian
S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ,
Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira
Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ,
Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS,
Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR,
Rasolonjatovo IMJ, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot
A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A,
Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam
MD, Anastasi C, Aniebo IC, Bailey DMD, Bancarz IR, Banerjee S,
Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ,
Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann
DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E,
Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos
KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW,
Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott
Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA,
Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD,
Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov
DV, Johnson MQ, James T, Huw Jones TA, Kang G-D, Kerelska TH,
Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales
PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch
JA, Lok M, Luo S, Mammen RM, Martin JW, PG MC, McNitt P, Mehta P,
Moon KW, Mullens JW, Newington T, Ning Z, Ling Ng B, Novo SM,
O’Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL,
Pickering L, Pike AC, Pike AC, Chris Pinkard D, Pliskin DP,
Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva
Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N,
Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper
RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL,
Sluis JP, Smith MA, Ernest Sohna Sohna J, Spence EJ, Stevens K,
Sutton N, Szajkowski L, Tregidgo CL, Turcatti G, Vandevondele S,
Verhovsky Y, Virk SM, Wakelin S, Walcott GC, Wang J, Worsley GJ,
Yan J, Yau L, Zuerlein M, Rogers J, Mullikin JC, Hurles ME, NJ MC,
West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R, Smith AJ
(2008) Accurate whole human genome sequencing using reversible
terminator chemistry. Nature 456:53–59CrossRefPubMedPubMedCentral
Bowers J, Mitchell J, Beer E,
Buzby PR, Causey M, Efcavitch JW, Jarosz M, Krzymanska-Olejnik E,
Kung L, Lipson D, Lowman GM, Marappan S, McInerney P, Platt A, Roy
A, Siddiqi SM, Steinmann K, Thompson JF (2009) Virtual terminator
nucleotides for next-generation DNA sequencing. Nat Methods
6:593–595CrossRefPubMedPubMedCentral
Branton D, Deamer DW,
Marziali A, Bayley H, Benner SA, Butler T, Di Ventra M, Garaj S,
Hibbs A, Huang X, Jovanovich SB, Krstic PS, Lindsay S, Ling XS,
Mastrangelo CH, Meller A, Oliver JS, Pershin YV, Ramsey JM, Riehn
R, Soni GV, Tabard-Cossa V, Wanunu M, Wiggin M, Schloss JA (2008)
The potential and challenges of nanopore sequencing. Nat Biotechnol
26:1146–1153CrossRefPubMedPubMedCentral
Brenner S (2014) Frederick
Sanger (1918–2013). Sci 343:262CrossRef
Eid J, Fehr A, Gray J, Luong
K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo
A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal
R, deWinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner
C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S,
Lundquist P, Ma C, Marks P, Maxham M, Murphy D, Park I, Pham T,
Phillips M, Roy J, Sebra R, Shen G, Sorenson J, Tomaney A, Travers
K, Trulson M, Vieceli J, Wegener J, Wu D, Yang A, Zaccarin D, Zhao
P, Zhong F, Korlach J, Turner S (2009) Real-time DNA sequencing
from single polymerase molecules. Sci 323:133–138CrossRef
Feng Y, Zhang Y, Ying C,
Wang D, Du C (2015) Nanopore-based fourth-generation DNA sequencing
technology. Genomics Proteomics Bioinformatics 13:4–16CrossRefPubMedPubMedCentral
Flusberg BA, Webster DR, Lee
JH, Travers KJ, Olivares EC, Clark TA, Korlach J, Turner SW (2010)
Direct detection of DNA methylation during single-molecule,
real-time sequencing. Nat Methods 7:461–465CrossRefPubMedPubMedCentral
Gilles A, Meglecz E, Pech N,
Ferreira S, Malausa T, Martin J-F (2011) Accuracy and quality
assessment of 454 GS-FLX titanium pyrosequencing. BMC Genomics
12:245CrossRefPubMedPubMedCentral
Goodwin S, Gurtowski J,
Ethe-Sayers S, Deshpande P, Schatz MC, McCombie WR (2015) Oxford
nanopore sequencing, hybrid error correction, and de novo assembly
of a eukaryotic genome. Genome Res 25:1750–1756CrossRefPubMedPubMedCentral
Gordon D, Huddleston J,
Chaisson MJP, Hill CM, Kronenberg ZN, Munson KM, Malig M, Raja A,
Fiddes I, Hillier LW, Dunn C, Baker C, Armstrong J, Diekhans M,
Paten B, Shendure J, Wilson RK, Haussler D, Chin C-S, Eichler EE
(2016) Long-read sequence assembly of the gorilla genome. Science
352:aae0344CrossRefPubMedPubMedCentral
Hackl T, Hedrich R, Schultz
J, Förster F (2014) Proovread: large-scale high-accuracy PacBio
correction through iterative short read consensus. Bioinformatics
30:3004–3011CrossRefPubMedPubMedCentral
Huse S, Huber J, Morrison H,
Sogin M, Welch D (2007) Accuracy and quality of massively parallel
DNA pyrosequencing. Genome Biol 8:R143CrossRefPubMedPubMedCentral
Ip C, Loose M, Tyson J, de
Cesare M, Brown B, Jain M, Leggett R, Eccles D, Zalunin V, Urban J,
Piazza P, Bowden R, Paten B, Mwaigwisya S, Batty E, Simpson J,
Snutch T, Birney E, Buck D, Goodwin S, Jansen H, O'Grady J, Olsen
H, null n (2015) MinION Analysis and Reference Consortium: Phase 1
data release and analysis [version 1; referees: 2 approved].
F1000Res 4:1075
Istace B, Friedrich A,
d’Agata L, Faye S, Payen E, Beluche O, Caradec C, Davidas S, Cruaud
C, Liti G, Lemainque A, Engelen S, Wincker P, Schacherer J, Aury
J-M (2016) de novo assembly and population genomic survey of
natural yeast isolates with the Oxford Nanopore MinION sequencer.
bioRxiv. doi.org/10.1101/066613
Kasianowicz JJ, Brandin E,
Branton D, Deamer DW (1996) Characterization of individual
polynucleotide molecules using a membrane channel. Proc Natl Acad
Sci U S A 93:13770–13773CrossRefPubMedPubMedCentral
Lee H, Gurtowski J, Yoo S,
Marcus S, McCombie WR, Schatz M (2014) Error correction and
assembly complexity of single molecule sequencing reads.
bioRxiv:006395
Liu L, Li Y, Li S, Hu N, He
Y, Pong R, Lin D, Lu L, Law M (2012) Comparison of next-generation
sequencing systems. J Biomed Biotechnol 2012:251364PubMedPubMedCentral
Loose M, Malla S, Stout M
(2016) Real-time selective sequencing using nanopore technology.
Nat Methods 13:751–754CrossRefPubMedPubMedCentral
Mardis E (2010) The $1,000
genome, the $100,000 analysis? Genome Med 2:84CrossRefPubMedPubMedCentral
Margulies M, Egholm M,
Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS,
Chen Y-J, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC,
He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie
TP, Jirage KB, Kim J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz
SM, Lei M, Li J, Lohman KL, Lu H, Makhijani VB, McDade KE, McKenna
MP, Myers EW, Nickerson E, Nobile JR, Plant R, Puc BP, Ronan MT,
Roth GT, Sarkis GJ, Simons JF, Simpson JW, Srinivasan M, Tartaro
KR, Tomasz A, Vogt KA, Volkmer GA, Wang SH, Wang Y, Weiner MP, Yu
P, Begley RF, Rothberg JM (2005) Genome sequencing in
microfabricated high-density picolitre reactors. Nature
437:376–380PubMedPubMedCentral
Maxam AM, Gilbert W (1977) A
new method for sequencing DNA. Proc Natl Acad Sci U S A
74:560–564CrossRefPubMedPubMedCentral
Mellmann A, Harmsen D,
Cummings CA, Zentz EB, Leopold SR, Rico A, Prior K, Szczepanowski
R, Ji Y, Zhang W, McLaughlin SF, Henkhaus JK, Leopold B,
Bielaszewska M, Prager R, Brzoska PM, Moore RL, Guenther S,
Rothberg JM, Karch H (2011) Prospective genomic characterization of
the German enterohemorrhagic Escherichia coli O104:H4 outbreak by
rapid next generation sequencing technology. PLoS One
6:e22751CrossRefPubMedPubMedCentral
Moore G (1965) Cramming more
components onto integrated circuits. Electrodiagn Ther
38:114–117
Petrosino JF, Highlander S,
Luna RA, Gibbs RA, Versalovic J (2009) Metagenomic pyrosequencing
and microbial identification. Clin Chem 55:856–866CrossRefPubMedPubMedCentral
Quail M, Smith M, Coupland
P, Otto T, Harris S, Connor T, Bertoni A, Swerdlow H, Gu Y (2012) A
tale of three next generation sequencing platforms: comparison of
ion torrent, Pacific biosciences and Illumina MiSeq sequencers. BMC
Genomics 13:341CrossRefPubMedPubMedCentral
Quick J, Quinlan A, Loman N
(2014) A reference bacterial genome dataset generated on the MinION
portable single-molecule nanopore sequencer. Gigascience
3:22CrossRefPubMedPubMedCentral
Quick J, Loman NJ, Duraffour
S, Simpson JT, Severi E, Cowley L, Bore JA, Koundouno R, Dudas G,
Mikhail A, Ouédraogo N, Afrough B, Bah A, Baum JHJ, Becker-Ziaja B,
Boettcher JP, Cabeza-Cabrerizo M, Camino-Sánchez Á, Carter LL,
Doerrbecker J, Enkirch T, Dorival IG, Hetzelt N, Hinzmann J, Holm
T, Kafetzopoulou LE, Koropogui M, Kosgey A, Kuisma E, Logue CH,
Mazzarelli A, Meisel S, Mertens M, Michel J, Ngabo D, Nitzsche K,
Pallasch E, Patrono LV, Portmann J, Repits JG, Rickett NY, Sachse
A, Singethan K, Vitoriano I, Yemanaberhan RL, Zekeng EG, Racine T,
Bello A, Sall AA, Faye O, Faye O, Magassouba NF, Williams CV,
Amburgey V, Winona L, Davis E, Gerlach J, Washington F, Monteil V,
Jourdain M, Bererd M, Camara A, Somlare H, Camara A, Gerard M, Bado
G, Baillet B, Delaune D, Nebie KY, Diarra A, Savane Y, Pallawo RB,
Gutierrez GJ, Milhano N, Roger I, Williams CJ, Yattara F,
Lewandowski K, Taylor J, Rachwal P, Turner DJ, Pollakis G, Hiscox
JA, Matthews DA, MKO S, Johnston AM, Wilson D, Hutley E, Smit E, Di
Caro A, Wölfel R, Stoecker K, Fleischmann E, Gabriel M, Weller SA,
Koivogui L, Diallo B, Keïta S, Rambaut A, Formenty P, Günther S,
Carroll MW (2016) Real-time, portable genome sequencing for Ebola
surveillance. Nature 530:228–232CrossRefPubMedPubMedCentral
Ross M, Russ C, Costello M,
Hollinger A, Lennon N, Hegarty R, Nusbaum C, Jaffe D (2013)
Characterizing and measuring bias in sequence data. Genome Biol
14:R51CrossRefPubMedPubMedCentral
Rothberg JM, Hinz W, Rearick
TM, Schultz J, Mileski W, Davey M, Leamon JH, Johnson K, Milgrew
MJ, Edwards M, Hoon J, Simons JF, Marran D, Myers JW, Davidson JF,
Branting A, Nobile JR, Puc BP, Light D, Clark TA, Huber M,
Branciforte JT, Stoner IB, Cawley SE, Lyons M, Fu Y, Homer N,
Sedova M, Miao X, Reed B, Sabina J, Feierstein E, Schorn M,
Alanjary M, Dimalanta E, Dressman D, Kasinskas R, Sokolsky T,
Fidanza JA, Namsaraev E, McKernan KJ, Williams A, Roth GT, Bustillo
J (2011) An integrated semiconductor device enabling non-optical
genome sequencing. Nature 475:348–352CrossRefPubMed
Sanger F, Nicklen S, Coulson
AR (1977) DNA sequencing with chain-terminating inhibitors. Proc
Natl Acad Sci U S A 74:5463–5467CrossRefPubMedPubMedCentral
Tilgner H, Grubert F, Sharon
D, Snyder MP (2014) Defining a personal, allele-specific, and
single-molecule long-read transcriptome. Proc Natl Acad Sci U S A
111:9869–9874CrossRefPubMedPubMedCentral
Travers KJ, Chin C-S, Rank
DR, Eid JS, Turner SW (2010) A flexible and efficient template
format for circular consensus sequencing and SNP detection. Nucleic
Acids Res 38:e159CrossRefPubMedPubMedCentral
Tyson JR, O’Neil NJ, Jain M,
Olsen HE, Hieter P, Snutch TP (2017) Whole genome sequencing and
assembly of a Caenorhabditis
elegans genome with complex genomic rearrangements using the
MinION sequencing device. bioRxiv. doi.org/10.1101/099143
Valouev A, Ichikawa J,
Tonthat T, Stuart J, Ranade S, Peckham H, Zeng K, Malek JA, Costa
G, McKernan K, Sidow A, Fire A, Johnson SM (2008) A
high-resolution, nucleosome position map of C. elegans reveals a lack of universal
sequence-dictated positioning. Genome Res 18:1051–1063CrossRefPubMedPubMedCentral
Wang Y, Yang Q, Wang Z
(2014) The evolution of nanopore sequencing. Front Genet
5:449PubMed
Wheeler DA, Srinivasan M,
Egholm M, Shen Y, Chen L, McGuire A, He W, Chen Y-J, Makhijani V,
Roth GT, Gomes X, Tartaro K, Niazi F, Turcotte CL, Irzyk GP, Lupski
JR, Chinault C, X-z S, Liu Y, Yuan Y, Nazareth L, Qin X, Muzny DM,
Margulies M, Weinstock GM, Gibbs RA, Rothberg JM (2008) The
complete genome of an individual by massively parallel DNA
sequencing. Nature 452:872–876CrossRefPubMed
Wolinsky H (2007) The
thousand-dollar genome. EMBO Rep 8:900–903CrossRefPubMedPubMedCentral
Yu DW, Ji Y, Emerson BC,
Wang X, Ye C, Yang C, Ding Z (2012) Biodiversity soup:
metabarcoding of arthropods for rapid biodiversity assessment and
biomonitoring. Methods Ecol Evol 3:613–623CrossRef