Prev Next

Christoph BleidornPhylogenomics10.1007/978-3-319-54064-1_4

4. Sequencing Strategies

Christoph Bleidorn¹

(1)

Museo Nacional de Ciencias Naturales, Spanish National Research Council (CSIC), Madrid, Spain

Shotgun strategies help sequencing whole genomes in small fragments which are assembled into longer contigs afterwards.
RADseq strategies provide a reduced but consistent set of sequences of the genome which are especially used for population genetics.
Hybrid enrichment describes the specific enhancement of preselected sequences.
RNA-Seq analyses characterize the sequence content and according expression level of transcriptomes.
Technical developments paved the way to sequence genomes and transcriptome of single cells.

4.1 Shotgun Sequencing

The length of prokaryote and eukaryote genomes exceeds by far the length of sequence reads produced by available technologies. Moreover, in the case of eukaryotes, the genomic information is distributed across a number of chromosomes. Therefore, different strategies have been developed for complete genome sequencing. Many of these methods have been explored in the course of the human genome project, e.g. transposon-based methods to integrate random insertions into cloned DNA or multiplex PCR strategies (Green 2001; Church and Kieffer-Higgins 1988). However, the most common method is shotgun sequencing, which was developed in the early 1980s (Anderson 1981; Gardner et al. 1981). For shotgun sequencing, a large stretch of DNA is fragmented into smaller pieces. In the next step, random pieces of the fragmented DNA are sequenced to generate redundant amounts of sequence data. Finally, individual sequence reads are assembled to reconstruct the sequence of the analysed genome (Green 2001). Two different strategies using shotgun sequencing have been used in genome-sequencing projects (◘ Fig. 4.1): (I) hierarchical shotgun sequencing and (II) whole-genome shotgun sequencing.

Fig. 4.1

Overview of shotgun-sequencing methods. a For hierarchical shotgun sequencing, large fragments of the original chromosome are cloned into BAC clones. BAC clones with overlapping fragments are chosen according to physical mapping information and fragmented into small fragments. BAC clone fragments are sequenced and assembled for each clone separately. Assembled contigs will be overlapped according mapping information to the final contig. b For whole-genome shotgun sequencing, chromosomes will be directly fragmented, without mapping information. Fragments will be sequenced and reads will be assembled into contigs

For hierarchical shotgun sequencing (◘ Fig. 4.1a), large fragments of DNA are cloned using bacterial artificial chromosomes (BACs). BACs are cloning vectors derived from Escherichia coli plasmids and have the advantage that the insertion of relatively large DNA fragments (>100–300 Kb) is possible (Shizuya et al. 1992). Alternatively, other cloning systems have been used, but less frequent than BACs. In a second step, a physical map of the cloned DNA is established. Various physical mapping approaches have been developed, including BAC restriction-based fingerprinting (Marra et al. 1997), iterative hybridization (Mozo et al. 1999), and the use of BAC-end sequences for connecting BAC clones by sequence identity (Mahairas et al. 1999). Restriction-based fingerprinting methods digest BAC clones by using a set of restriction enzymes (e.g. two enzymes in case of double digest), thereby generating a set of different sized fragments which can be visualized using gel electrophoresis. For each BAC, a unique pattern of bands on a gel is derived, and the presence and absence of fragment sizes can be scored. Finally, all BACs are ordered in relative position according to their similarity regarding shared fragment sizes (Soderlund et al. 1997). Based on this information, a minimal set of overlapping BACs which is in total completely covering a selected genomic region (minimal tiling path) is selected. For individual sequencing of BACs, their inserted DNA is purified and physically shared to generate smaller fragments for sequencing. For Sanger sequencing, broken ends of the sheared fragments are enzymatically repaired, and all fragments are size fractionized using gel electrophoresis. Medium-sized (2–3 Kb) fragments are selected and cloned into sequencing vectors, which can be finally sequenced using conserved primer sites in the vector. A random collection of sequences of ~10x coverage is generated for each BAC, which can be used for BAC contig assembly. Contigs for all BACs of the minimal tiling path are overlapped according the information from the physical mapping to generate the final sequence, which should represent the sequenced genomic region. The first available larger eukaryote genomes, e.g. Arabidopsis thaliana (The Arabidopsis Genome Initiative 2000) and Caenorhabditis elegans (The C. elegans Sequencing Consortium 1998), have been sequenced with this approach. The International Human Genome Sequencing Consortium (2001) used hierarchical shotgun sequencing for the human genome.

Whole-genome shotgun (wgs) sequencing (◘ Fig. 4.1b) directly involves sequencing of sheared genomic DNA, thereby leaving out the time-consuming step of establishing a physical map (Green 2001). In case of using Sanger sequencing, sheared DNA is end repaired, subcloned into sequencing vectors and sequenced in a high coverage. Assembly of this kind of sequence data usually leads to less continuous contigs, as the topological information from physical mapping is missing. Initially, this approach was mainly used for (small) bacterial genomes. Weber and Myers (1997) used simulations to demonstrate the practicability of wgs for sequencing large eukaryote genomes. Most famously, this was validated in practice by Craig Venter and colleagues by sequencing and assembling the human genome using wgs data (Venter et al. 2001).

Next-generation sequencing (NGS) techniques dramatically increased the output of sequencing reads, and wgs approaches became a standard. However, the most powerful methods in terms of sequence reads output (Illumina, Ion Torrent) are also the methods producing the shortest reads (100–250 bp). Especially the assembly of eukaryote genomes, which are often rich in repetitive sequences, became a major challenge. One strategy to provide extra information for assembling wgs data is the use of mate-pair sequencing. Mate pairs describe the sequenced ends of DNA fragments separated by a specific size. For example, if the ends of a 3 Kb fragments are sequenced, the topological information that these sequences should be separated by roughly this size can be used to improve assemblies. Mate pair libraries have been developed for all major short-read sequencing techniques (Illumina, 454, Ion Torrent), and even though details may vary, the principle remains the same. Most frequently, mate-pair sequencing is conducted with Illumina, and therefore details are explained for this method.

In the first step, genomic DNA is sheared into fragments of the desired size (◘ Fig. 4.2a). Typical sizes for mate pair libraries range from 2 to 5 Kb, even though larger libraries (5 to 25 Kb) are also feasible (van Heesch et al. 2013). DNA fragments are end repaired and the 3′-ends are labelled with biotin (◘ Fig. 4.2b). The B-vitamin biotin is widely used in molecular biology and can be covalently attached to proteins or nucleic acids. Biotin binds with high specificity and very fast to streptavidin. Magnetic beads covered with this protein can be used to specifically enrich biotinylated molecules. The size of prepared fragments can be selected using agarose gel electrophoresis, and size information is essential for subsequent computational analysis. Biotinylated fragments are circularized by intramolecular ligation (◘ Fig. 4.2c), and remaining linear molecules are enzymatically removed. The circularized DNA molecules are sheared again into a size of ~500 bp (◘ Fig. 4.2d). The fragments containing the biotinylated ends are selected using streptavidin-covered magnetic beads (◘ Fig. 4.2e), and remaining fragments are washed away. The selected fragments contain the 3′-ends of the original DNA fragments. Finally, sequencing adaptors are attached to the selected fragments to prepare the sequencing library (◘ Fig. 4.2f). Sequencing of these fragments generates read pairs which align towards the ends of the original size-selected fragment and are outward facing from each other. The gap between these reads is approximately of the size of the original fragment, and this information is valuable for contig assembly and scaffolding of genomes (Chaisson et al. 2009).

Fig. 4.2

Construction of mate-pair libraries (Illumina). a DNA is sheared into fragments. b DNA fragments are end repaired and biotinylated. c Biotinylated fragments are circularized. d Circularized DNA molecules are sheared into ~500 bp fragments. e Fragments containing biotinylated ends are selected using streptavidin-covered magnetic beads; remaining fragments are washed away. f Adaptors for sequencing are ligated to selected fragments

Mapping strategies have been developed to improve and validate wgs assemblies, e.g. optical mapping (Schwarz et al. 2014). This method is similar to the restriction-based fingerprinting approach described above. For optical mapping, large DNA molecules are immobilized on a surface and digested with one or more restriction enzymes (◘ Fig. 4.3). The digested DNA molecules are stained with a fluorescent dye. The length between adjoining cut sites is estimated by measuring the fluorescence intensity. Mapping data of each single DNA molecule is used to produce a consensus genomic optical map, which includes an ordered series of DNA fragment sizes (Mendelowitz and Pop 2014). Recently, a high-throughput method of optical mapping using nanochannels has been proposed (Lam et al. 2012). With this approach, DNA fragments are nicked by an enzyme at specific sequence sites and subsequently fluorescently labelled. With the help of an electric field, molecules are driven through a nanoscale channel, where the DNA is stretched. In this channel, distances between fluorescent labels can be measured using a microscope. A unique optical pattern resembling a barcode is created by the distance measure of the labels (Michaeli and Ebenstein 2012).

Fig. 4.3

A workflow for optical mapping (By Fong Chun Chan and Kendric Wang (Own work) [CC BY 3.0 (► http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons)

A mapping strategy which became recently popular has been commercialized by Dovetail Genomics and is based on a Hi-C approach (Lieberman-Aiden et al. 2009). The idea behind Hi-C is that, after fixation of chromatin structure, DNA segments which are in close proximity in the nucleus are more likely to be ligated together. This is reflected by the finding that the number of intra-chromosomal ligation pairs decreases while the genomic distance between them increases. With the so-called cHiCago protocol, Hi-C mapping is used for the localization of chromatin interactions to infer the relative order and orientation of contigs (Putnam et al. 2016). Using this protocol, chromatin is reconstituted in vitro and fixed with formaldehyde. The fixed chromatin is then cut with a restriction enzyme, thereby generating free sticky ends, which are filled with biotinylated and thiolated nucleotides. In the next step, free blunt ends are ligated, and chromatin crosslinks to generate ligation mate pairs, which are fusions of fragments which are distantly located in the genome. After library preparation, these fragments can be sequenced with NGS methods. The mapping of these fragments helps to dramatically improve genome assemblies based on various NGS techniques (e.g. Illumina, PacBio). For example, by using the cHiCago protocol, the scaffold N50 of the Illumina-based genome assembly of the American alligator could be increased from 508 Kb to 10 Mb (Putnam et al. 2016) (◘ Fig. 4.4).

Fig. 4.4

Diagram of the cHiCago library preparation protocol as used by Dovetail Genomics. a Chromatin (nucleosomes in blue) is reconstituted in vitro upon naked DNA (black strand). b Fixation of chromatin by formaldehyde. Red lines indicate crosslinks. c Cutting of fixed chromatin using restriction enzymes. d Filling of sticky ends with biotinylated (blue circles) and thiolated (green squares) nucleotides. e Ligation of free blunt ends (red asterisks). f Fragments for library preparation are yielded by reversion of crosslinks and removal of proteins. Terminal biotinylated nucleotides are removed (Reprinted from Putnam et al. (2016))

A different way to improve wgs assemblies is by using long sequencing reads. This can be directly done by sequencing with third-generation techniques such as single-molecule real-time sequencing or nanopore sequencing. Alternatively, long reads can also be generated synthetically for Illumina short-read sequencing. Illumina itself distributes a technique called TruSeq, which was formerly known under the name Moleculo. With this approach, ~10 Kb DNA fragments are amplified and barcoded before sequencing, and long reads can be created afterwards based on this information. The company 10X Genomics released an instrument called Chromium which used a similar but more powerful approach for the generation of synthetic long reads. Up to 100 Kb long DNA fragments are amplified and barcoded with an emulsion PCR step. Subsequently, these fragments are sequenced in a very low coverage, and sequenced barcodes localize clouds of short reads which are used to scaffold de novo assemblies (Lee et al. 2016). The advantage of both these methods is their considerably lower price compared to true long-read sequencing. However, synthetically generated «long» reads are prone to biases of the Illumina technology, e.g. less or no coverage in regions with high GC content. Also, tandem repeats are still difficult to tackle with this approach.

4.2 RADseq

Due to the advent of NGS techniques, genome sequencing became feasible and affordable even for non-model organisms and also smaller laboratories. However, for many studies, it is sufficient to analyse a snapshot of the genome, but for a high number of individuals. A set of related methods used to sequence a reduced, but consistent representation of the genome is known as restriction site-associated DNA sequencing (RADseq). Applications of RADseq include discovery of genetic markers for phylogenetics and population genetics (Cruaud et al. 2014; Davey et al. 2011), mapping of quantitative trait loci (QTLs) (Houston et al. 2012), linking mapping (Gonen et al. 2014) or local genome assembly (Etter et al. 2011). The name RADseq was introduced for one specific approach of reduced representation sequencing (Baird et al. 2008), but is now used to describe several similar methods (Andrews et al. 2016). Besides the original RADseq approach, this family includes methods like ddRAD (Peterson et al. 2012), ezRAD (Toonen et al. 2013), 2bRAD (Wang et al. 2012), and the widely used genotyping by sequencing (GBS) (Elshire et al. 2011).

The original RADseq protocol starts with the digestion of genomic DNA with one restriction enzyme (◘ Fig. 4.5a). Restriction enzymes are able to cleave DNA in either random (type I) or specific positions (type II). The first restriction enzyme cutting specific sequence motive (HindII) was isolated from the bacterium Haemophilus influenzae (Smith and Welcox 1970). Since that time, several thousand restriction enzymes (targeting different sequence motives) have been described, and hundreds are commercially available. A list of available restriction enzymes and their properties are collected in the database REBASE (Roberts et al. 2015). The choice of the restriction enzyme greatly influences in how many pieces the genome is cut. By a rule of thumb, the longer the recognized sequence motive, the less fragments are generated. For example, a six-base pair motive as recognized by the EcoR1 enzyme (◘ Fig. 4.5a) will cut every 4,000 bp, whereas an eight-base pair motive would only cut every 65,500 bp (Andrews et al. 2016). These numbers are rough estimates and are greatly influenced by the base composition of the investigated genome. Restriction enzymes can either cut symmetrically, thereby generating blunt ends, or asymmetrically. By using an asymmetrical cutting enzyme, all fragments will bear so-called sticky ends, which describe the overhang created by cutting with the restriction enzyme. An adaptor can be ligated to these sticky ends, which includes a known primer site for PCR amplification (◘ Fig. 4.5b). If adaptors bearing unique barcode sequences are used, multiple libraries can be mixed at this point (multiplexing). This barcode will be read during sequencing and allows the separation of multiplexed samples. The complete DNA library will be sheared, followed by reparation of sequence ends. Using blunt-end ligation, a second adaptor is ligated to all fragments (◘ Fig. 4.5c). This second adaptor is Y-shaped, containing an only partially overlapping sequence. The resulting DNA library will be amplified using a primer pair (e.g. P1 and P2) (◘ Fig. 4.5d). One sequencing primer site is nested in the first adaptor (P1). The second primer site is identical to one of the nonoverlapping sequence parts of the y-shaped adaptors (P2). The y-shaped adaptor is completed when fragments containing the first adaptor are bound by P1 and copied. Primer P2 only binds to the Y-shaped adaptor after completion. Thereby, specificity of the amplification is enhanced, as only fragments containing both adaptors are amplified (◘ Fig. 4.5d). The enhanced library can be sequenced using NGS. With this method, thousands of single nucleotide polymorphic (SNP) loci can be generated (Davey et al. 2011).

Fig. 4.5

Workflow of the original RADseq protocol. a Genomic DNA is cut with a chosen restriction enzyme (in this example EcoR1) for fragmentation. b Using the overhang as created by the restriction enzyme, an adaptor is ligated to sequence fragments. The complete pool of DNA is sheared mechanically. c Y-shaped adaptors are ligated to the sheared pool of DNA fragments. d Using priming sites in both adaptors, the DNA library is amplified. Only fragments containing both adaptors can be successfully amplified

Several variants of the original RADseq protocol have been developed (see above), which differ in details of restriction enzyme digestion, size selection or adaptor ligation (Andrews et al. 2016). Commonly used alternative protocols are ddRAD and GBS. In the case of double digest RADseq (ddRAD), two different restriction enzymes are utilized to digest the genomic DNA (Peterson et al. 2012). Adaptors are ligated to each cut site, and size selection is facilitated by choosing those fragments, which are flanked by restriction enzyme recognition sites that are neither too close or too distant (◘ Fig. 4.6). Using this method, all reads of a given locus share the same fragment size, as no shearing step is involved. Moreover, size selection further decreases the number of analysed loci, which in turn increases the coverage in terms of sequence reads. In contrast, in the case of RADseq (see above), each sequenced fragment has a cut site at one end and a randomly sheared end at the other. Thereby a range of fragment sizes is produced for each locus (Andrews et al. 2016).

Fig. 4.6

Comparison of analysed loci by RADseq a and ddRADseq b. In the case of ddRADseq, b size selection excludes regions flanked by either [a] very close or [b] very distant restriction enzyme recognition sites (Figure from Peterson et al. (2012))

GBS is basically a simplified protocol of the RADseq approaches described above. DNA is digested with one restriction enzyme, and a pair of adaptors is ligated to each fragment. One adaptor contains a barcode unique for each library (e.g. for single individuals); the other adaptor is a common adopter used in all libraries (Elshire et al. 2011). Subsequently, all libraries are pooled and a PCR is performed with primer sites nesting in the ligated adaptors. The pooled and amplified library can be sequenced using NGS. Modifications of this simple protocol, using two restriction enzymes and y-shaped adaptors, have been published (Poland et al. 2012). GBS approaches have been especially widely used for SNP discovery in large plant genomes (Deschamps et al. 2012), but also population genomic analyses (Friis et al. 2016).

The number of loci identified by RADseq methods is influenced by the frequency of cut sites of the chosen restriction enzymes, size selection (if applied), genome size of the target organism and chosen RADseq method. If a reference genome is available, in silico analyses can be performed to optimize RADseq experiments (Lepais and Weir 2014). Such analyses are used to predict the number of retrieved loci given the choice of restriction enzyme or based on alternative methods. Even though in many cases there are no reference genomes available, genome-wide surveys of frequencies of restriction enzyme recognition sequences show a high variability across eukaryotic taxonomic groups (Herrera et al. 2015). The frequency of this cleavage sites seems to be similar among closely related species, which helps to choose enzymes for RADseq experiments with organisms lacking a reference genome. Moreover, as RADseq methods differ in costs and hands-on time in the lab, these factors further influence the numbers of samples which can be analysed. Pooling samples without using individually barcoded adaptors are a cost-efficient alternative, but may prohibit some downstream population genetic analyses (Futschik and Schlötterer 2010; Andrews et al. 2014).

Advantages and disadvantages of different RADseq methods have been discussed in detail (Puritz et al. 2014; Andrews et al. 2014; Andrews et al. 2016). Several biases due to methodological artefacts may influence the analysis of RADseq data in general. A common problem is the introduction of PCR duplicates. These duplicates do not represent independent samples from the analysed genomic DNA pool. As independence of samples is an underlying assumption of most population genetic analyses, this may result in skewing allele frequencies, genotyping errors or false-positive alleles (Andrews et al. 2014). Putative PCR duplicates can be identified when using RADseq methods that include a random-shearing step, as in the original RADseq protocol (see above). By analysing paired-end sequence reads, PCR duplicates can be identified as fragments that are identical across forward and reverse reads (Davey et al. 2011). Additional sources of bias introduced during PCR are preferential amplification of loci based on GC content and fragment size, which may impact the variance of sequence read coverage across loci (Puritz et al. 2014). Critical for all RADseq methods are problems due to non-random sampling leading to systematic underestimation of polymorphisms (Arnold et al. 2013; Huang and Knowles 2014). Non-random sampling results from polymorphic recognition sequences of the used restriction enzymes, resulting in missing data for some chromosomes/individuals (allelic dropout).

4.3 Hybrid Enrichment

Hybrid enrichment methods are used for the specific capture and enrichment of selected sequences (Lemmon and Lemmon 2013). In short, capture probes (DNA or RNA) that are complementary to targeted regions in the genome are hybridized to a DNA library, and target DNA is enriched by washing away nontargeted DNA prior to high-throughput sequencing. This method has been used to enrich selected single-copy orthologous loci for phylogenetic analyses, as in anchored hybrid enrichment (AHE) (Lemmon et al. 2012) or enrichment of ultraconserved elements (UCE) (Faircloth et al. 2012). Moreover, it is widely used for the enrichment of exonic DNA (Li et al. 2013) or organelle DNA (Briggs et al. 2009). Prior to the enrichment, long oligonucleotides (usually ∼60–120 bp) which cover the target regions have to be designed and synthesized. For this purpose, genomic or transcriptomic resources of the target species or closely related species are used as a reference. In the case of AHE, and when targeting UCEs, it has been shown that capture probes could even be successfully designed for vertebrates across multiple evolutionary timescales, in some cases spanning divergence times of ~500 million years (Lemmon et al. 2012; Faircloth et al. 2012). Capture probes can be designed for several hundred to thousands of loci in parallel, which may involve several thousand oligonucleotides. Most target enrichment applications follow a solution-based enrichment protocol (sometimes with modifications) as developed by Gnirke et al. (2009) (◘ Fig. 4.7). Designed oligonucleotides are synthesized on a microarray (Lipshutz et al. 1999), cleaved and eluted. After initial PCR, a T7 promoter sequence is added to the double-stranded DNA. This promoter can be used to transcribe DNA to RNA with the help of T7 RNA polymerase. This polymerase is promoter specific in only transcribing double-stranded DNA downstream of a T7 promoter sequence (Studier and Moffatt 1986). The transcription takes place under the presence of biotin-UTPs, thereby generating biotinylated single-stranded RNA capture baits (◘ Fig. 4.7a). Meanwhile, genomic DNA of the target organism is sheared, end repaired, adaptor ligated (grey) and PCR amplified (◘ Fig. 4.7b). Capture of targets will take place in solution. For this purpose, strands of genomic DNA are separated and hybridized with the prepared biotinylated RNA baits (◘ Fig. 4.7c). After hybridization, target DNA (and unbound probes) can be captured using magnetic streptavidin-coated beads (◘ Fig. 4.7c). Unbound DNA is washed away, whereas captured and thereby enriched target DNA is eluted, PCR amplified and ready to be sequenced using NGS platforms (◘ Fig. 4.7d).

/epubstore/B/C-Bleidorn/Phylogenomics/OEBPS/A332029_1_En_4_Fig7_HTML.jpg

Fig. 4.7

Principle of solution hybrid selection. Colours represent differently targeted DNA regions. Black diamonds represent biotin label. a Long oligonucleotides are synthesized on a microarray, cleaved and eluted. After initial PCR, a T7 promoter is added to double-stranded DNA. In the presence of biotin-UTP, biotinylated single-stranded RNA baits are generated (milky lines with black diamonds). b Genomic DNA of the target organism is sheared, end repaired, adaptor ligated (grey) and PCR amplified. c Strands of genomic DNA are separated and hybridized in solution with biotinylated RNA baits. d Free biotinylated RNA baits and those hybridizing to target DNA are captured using streptavidin-coated magnetic beads. e Captured DNA fragments are eluted and amplified by PCR

Especially two approaches became widely used for phylogenomic studies. Anchored hybrid enrichment as introduced by Lemmon et al. (2012) identifies conserved DNA regions flanked by less conserved regions for probe design. Usually alignments of genomically well-characterized model species are exploited to design oligonucleotides. AHE has been mostly used for phylogenetic analyses of different groups of vertebrates (Prum et al. 2015; Eytan et al. 2015; Ruane et al. 2015). Faircloth et al. (2012) targeted UCEs, which have been initially described as perfectly conserved segments of mammalian genomes which are not functionally transcribed (Dermitzakis et al. 2005). Such regions have been also described in other animals, but also plants and fungi (Siepel et al. 2005; Zheng and Zhang 2008). Using UCEs has the advantage that a set of loci can be characterized in highly divergent reference genomes and later applied to a diverse set of taxa, without the need of always designing new probes (Jones and Good 2016). As UCEs are often flanked by variable regions, this method also works across shallow evolutionary timescales as, for example, demonstrated in the phylogenetic analysis of a cichlid radiation (McGee et al. 2016).

Hybridization enrichment strategies have been also successfully used when working with ancient DNA. Often only a very low level of endogenous DNA is preserved in ancient specimens (1–2%), while the majority represents environmental DNA (Carpenter et al. 2013). Moreover, the DNA is normally highly degenerated, and only short and also damaged fragments are present. Consequently, wgs approaches might be not effective and too costly when dealing with ancient DNA. Fu et al. (2013) developed capture probes targeting the complete mitochondrial genome and representative portions from the nuclear genome in ancient humans. It was furthermore possible to sequence complete mitochondrial genomes from the oldest so far investigated ancient humans (> ~300,000 years ago) (Meyer et al. 2014). This method has been also demonstrated to work with highly degraded and ultrashort DNA in non-permafrost-preserved cave bears from the Middle Pleistocene (Dabney et al. 2013). Target capture of mitochondrial genomes in permafrost-preserved horse fossils even allowed the analyses of specimens which dated 560,000 to 780,000 years ago (Orlando et al. 2013).

Alternatively to in solution hybridization methods, capture can take place directly on a microarray (Albert et al. 2007). DNA microarrays have been initially used to study gene expression pattern (Schena et al. 1995), an application which is now more and more supplanted by RNA-Seq (see below). DNA microarrays are a collection of DNA sequences which are attached to a surface (e.g. glass). Specific PCR products or designed oligonucleotides can be printed at specified sites on glass slides using high-precision arraying robots (Schulze and Downward 2001). Complementary DNA can be directly hybridized to DNA microarrays and thereby captured. If this DNA is fluorescently labelled, the intensity of bound DNA can be measured, e.g. to infer the relative expression of mRNA. In the case of hybridization enrichment, genomic DNA is sheared, adaptor ligated, amplified and hybridized with the array (Albert et al. 2007). Non-hybridized DNA is washed away, while the captured (and thereby enriched) DNA fragments are eluted and prepared for subsequent NGS library preparation. Liu et al. (2016) demonstrated the successful enrichment of mitochondrial genomes of insects using such a microarray capture approach.

4.4 Expressed Sequence Tags and RNA-Seq

The transcriptome comprises the complete set of transcripts, as well as their quantity, of a cell or population of cells. Several technologies are available to sequence and quantify the transcriptome, including hybridization-based approaches using microarrays (see above) or direct sequencing (Wang et al. 2009). Using Sanger-based techniques, sequencing of expressed sequence tags (ESTs) was established in the 1990s to characterize transcriptomes (Adams et al. 1991), even though the lack of sequencing power usually did not allow the quantification of gene expression. By harnessing the power of NGS techniques, RNA-Seq became the method of choice to sequence transcriptomes and to determine gene expression levels. In general, for both methods RNA is reverse transcribed to a library of cDNA fragments. The RNA can be total, selected for transcripts carrying a poly-A-tail or depleted in ribosomal RNA. Similarly, specific libraries targeting small RNAs (e.g. tRNAs, microRNAs) can be constructed. For EST sequencing, cDNA is cloned into an appropriate vector, which is sequenced from both ends. Alternatively, directional cloning of cDNA is possible, so that only 5′-ends of the sequences are sequenced, thereby avoiding poly-A-tail sequences. Sequencing takes place with the Sanger technique and usually an amount of a few hundred or thousands transcript ends is manageable. This method played an important role in gene discovery (Schuler 1997) and also paved the way for the first broadscale phylogenomic studies in animals (Dunn et al. 2008). With dbEST, an entire database hosted by NCBI GenBank is dedicated to EST sequences (Boguski et al. 1993).

Transcriptome sequencing by RNA-Seq exploits available NGS high-throughput technologies (Wang et al. 2009). As for EST sequencing, RNA is firstly converted to a cDNA library. The cDNA fragments will then be prepared for NGS methods by attaching adaptors to both ends. The library is finally sequenced in a high-throughput manner to obtain a high coverage of short sequence reads. RNA-Seq can be used for transcriptome assembly, as well as expression profiling at the same time. Especially for non-model organisms, RNA-Seq became the method of choice for de novo transcriptome assembly, gene discovery and gene expression comparisons (Ekblom and Galindo 2011; McCormack et al. 2013; Todd et al. 2016). By using RNA-Seq, hundreds to thousands of putatively orthologous genes can be discovered, and thereby transcriptome-based phylogenomic analyses became state of the art to understand animal evolution (Telford et al. 2015; Dunn et al. 2014). Moreover, RNA-Seq is a powerful tool for gene expression analyses. The expression level of genes is measured by the number of sequenced fragments that map back to each transcript. For RNA-Seq, abundance levels are given as mapped reads per kilobase (RPKM) (Mortazavi et al. 2008). Compared to microarray studies, the RNA-Seq approach offers several advantages (◘ Table 4.1), e.g. identification of gene isoforms and allele-specific expression, nucleotide polymorphisms and post-transcriptional base modifications (Malone and Oliver 2011; Rapaport et al. 2013). Importantly, this approach also enabled comparative gene expression studies for organisms where reference genomes or transcriptomes are missing (Todd et al. 2016). Consequently, RNA-Seq became a powerful approach to study differential gene expression, which aims to investigate qualitative and quantitative differences of genes expressed in different cell types (Gilbert 2013).

Table 4.1

Comparison of different methods investigating gene expression (partly adopted from Wang et al. (2009))

	Microarray	ESTs	RNA-Seq
Principle	Hybridization	Sanger	NGS (e.g. Illumina)
Resolution	Several to 100 bp	Single base pair	Single base pair
Throughput	High	Low	High
Prior genomic resources	Required	Not required	Not required
Isoform distinction	No	Yes	Yes
Allelic expression	No	Yes	Yes

As powerful and straightforward the counting of mapped reads appears, several pitfalls have to be avoided when working with RNA-Seq data (Tarazona et al. 2011; Vijay et al. 2013). The expression signal of any given transcript is obviously limited by the sequencing depth and is thereby also dependent on the level of expression of other transcripts (Rapaport et al. 2013). Additionally, there is a transcript length bias, as more reads map to long transcripts compared to short transcripts of similar expression (Oshlack and Wakefield 2009). Thereby, the probability to detect the presence as well as differential expression of a given transcript varies strongly. Biological variance in gene expression due to genetic or environmental differences can further complicate RNA-Seq analyses (Todd et al. 2016). And, finally, bias can be introduced by technical differences when comparing different sequencing runs (or even lanes of a single flow cell) or different library preparations (McIntyre et al. 2011). To deal with these problems, gene expression experiments should be designed carefully. For example, increased sequence depth may help to uncover lowly expressed variants and alleviate problems related to transcript length, but at the same time also increases the number of false positives due to sequencing errors. As a rule of thumb, the larger the genome of the analysed species, as more complex is its transcriptome. For «simple» yeast transcriptomes, it was shown that with 30 million short (35 bp) reads the expression of >90% of the expected transcripts could be detected (Wang et al. 2009). For the more «complex» chicken transcriptome, similar numbers (~30 million) of medium-sized reads (75 bp) were enough to detect 90% of all annotated genes, and even with 10 million reads, 80% of the genes could be detected (Wang et al. 2011). By reviewing gene expression studies across diverse sets of eukaryotes, Todd et al. (2016) recommend that efforts in the range of 5 to 20 million mapped reads per sample seem a sufficient sequencing depth. There is also a trade-off in the number of biological replicates to be sequenced and their costs. Such replicates can improve estimates of variance for different sources of bias and are obviously necessary to quantify biological variation. It has been shown that the increase of number of biological replicates has a stronger positive effect on the statistical power of differential gene expression experiments than increasing the sequencing depth for each sample (Liu et al. 2014). Useful guidelines for the design of RNA-Seq experiments in the context of evolutionary and ecological research questions are given by Wolf (2013) and Todd et al. (2016).

4.5 Single-Cell Genomics and Transcriptomics

Single-cell genomics and transcriptomics aim to study genetic diversity on a cellular level (Tang et al. 2011; Shapiro et al. 2013). Using these approaches it is possible to study microbial ecosystems and cell lineage relationships or to connect genotypes with phenotypes on a single-cell level. However, the acquisition of high-quality single-cell sequencing data comes with major technical challenges: (1) physical isolation of individual cells, (2) amplification of the genome (or transcriptome) of single cells for downstream analyses and (3) analysing the data given the biases and errors introduced during the first two steps (Gawad et al. 2016). The isolation of individual cells can be facilitated by methods like serial dilution, microfluids, micromanipulation, laser-capture microdissection or fluorescence-activated cell sorting (FACS) (Yilmaz and Singh 2012). Single cells have to be transferred to reaction tubes for subsequent DNA or RNA extraction. In case of RNA, reverse transcription into cDNA is necessary. Currently, amplification of the DNA (or cDNA) of single cells is required to gain a sufficient amount of molecules for sequencing. However, in the near-future single-molecule sequencing as performed by third-generation sequencing, platforms (PacBio, Oxford Nanopore) should supersede this step. It is possible to sequence the transcriptome and genome of the same cell as demonstrated by Macaulay et al. (2015).

Single-cell genomics has emerged as a powerful tool to recover genomic information from uncultured, individual cells of environmental microorganisms (Stepanauskas 2012). As this method recovers all genomic information of a given cell, chromosomal and extrachromosomal elements are recovered, thereby also detecting possible infections by viruses. For example, Labonte et al. (2015) demonstrated the possibility to investigate host-virus relationships in marine microbial communities. Further on, single-cell genomics helps to link the genotype of so far unculturable prokaryotes with metabolic functions as derived from annotation of their genomes. For example, the investigation of ubiquitous but uncultured Proteobacteria lineages sampled in the dark oxygenated ocean revealed potential chemolithoautotrophy, thereby providing a new perspective on carbon cycling of this large oceanic habitat (Swan et al. 2011).

Single-cell transcriptomic approaches have been successfully implemented for evolutionary developmental research. Lee et al. (2014) developed fluorescent in situ RNA sequencing (FISSEQ), a method where cDNA is directly sequenced within biological samples (tissue sections, whole-mount embryos). Alternatively, Achim et al. (2015) proposed to compare transcriptomes from single-cell sequencing (of cells with unknown spatial locations) with available expression profiles from a gene expression atlas. Using this method >80% of cells could be allocated to precise locations in the brain of the model annelid Platynereis dumerilii. Ultimately, these methods will help to resolve the origin, features and fate of different cell types in complex tissues (Satija et al. 2015).

References

Achim K, Pettit J-B, Saraiva LR, Gavriouchkina D, Larsson T, Arendt D, Marioni JC (2015) High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat Biotechnol 33:503–509CrossRefPubMed

Adams M, Kelley J, Gocayne J, Dubnick M, Polymeropoulos M, Xiao H, Merril C, Wu A, Olde B, Moreno R, Kerlavage A, McCombie WR, Venter JC (1991) Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252:1651–1656CrossRefPubMed

Albert TJ, Molla MN, Muzny DM, Nazareth L, Wheeler D, Song X, Richmond TA, Middle CM, Rodesch MJ, Packard CJ, Weinstock GM, Gibbs RA (2007) Direct selection of human genomic loci by microarray hybridization. Nat Methods 4:903–905CrossRefPubMed

Anderson S (1981) Shotgun DNA sequencing using cloned DNase I-generated fragments. Nucleic Acids Res 9:3015–3027CrossRefPubMedPubMedCentral

Andrews KR, Hohenlohe PA, Miller MR, Hand BK, Seeb JE, Luikart G (2014) Trade-offs and utility of alternative RADseq methods: reply to Puritz et al. Mol Ecol 23:5943–5946CrossRefPubMed

Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA (2016) Harnessing the power of RADseq for ecological and evolutionary genomics. Nat Rev Genet 17:81–92CrossRefPubMedPubMedCentral

Arnold B, Corbett-Detig RB, Hartl D, Bomblies K (2013) RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol Ecol 22:3179–3190CrossRefPubMed

Baird NA, Etter PD, Atwood TS, Currey MC, Shiver AL, Lewis ZA, Selker EU, Cresko WA, Johnson EA (2008) Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One 3:e3376CrossRefPubMedPubMedCentral

Boguski MS, Lowe TMJ, Tolstoshev CM (1993) dbEST – database for «expressed sequence tags». Nat Genet 4:332–333

Briggs AW, Good JM, Green RE, Krause J, Maricic T, Stenzel U, Lalueza-Fox C, Rudan P, Brajković D, Kućan Ž, Gušić I, Schmitz R, Doronichev VB, Golovanova LV, de la Rasilla M, Fortea J, Rosas A, Pääbo S (2009) Targeted retrieval and analysis of five Neandertal mtDNA genomes. Science 325:318–321CrossRefPubMed

Carpenter ML, Buenrostro JD, Valdiosera C, Schroeder H, Allentoft Morten E, Sikora M, Rasmussen M, Gravel S, Guillén S, Nekhrizov G, Leshtakov K, Dimitrova D, Theodossiev N, Pettener D, Luiselli D, Sandoval K, Moreno-Estrada A, Li Y, Wang J, Gilbert MTP, Willerslev E, Greenleaf WJ, Bustamante CD (2013) Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries. Am J Hum Genet 93:852–864CrossRefPubMedPubMedCentral

Chaisson MJ, Brinza D, Pevzner PA (2009) De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res 19:336–346CrossRefPubMedPubMedCentral

Church G, Kieffer-Higgins S (1988) Multiplex DNA sequencing. Science 240:185–188CrossRefPubMed

Cruaud A, Gautier M, Galan M, Foucaud J, Sauné L, Genson G, Dubois E, Nidelet S, Deuve T, Rasplus J-Y (2014) Empirical assessment of RAD sequencing for interspecific phylogeny. Mol Biol Evol 31:1272–1274CrossRefPubMed

Dabney J, Knapp M, Glocke I, Gansauge M-T, Weihmann A, Nickel B, Valdiosera C, García N, Pääbo S, Arsuaga J-L, Meyer M (2013) Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc Natl Acad Sci U S A 110:15758–15763CrossRefPubMedPubMedCentral

Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499–510CrossRefPubMed

Dermitzakis ET, Reymond A, Antonarakis SE (2005) Conserved non-genic sequences – an unexpected feature of mammalian genomes. Nat Rev Genet 6:151–157CrossRefPubMed

Deschamps S, Llaca V, May GD (2012) Genotyping-by-sequencing in plants. Biology 1:460CrossRefPubMedPubMedCentral

Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, Sorensen MV, Haddock SHD, Schmidt-Rhaesa A, Okusu A, Kristensen RM, Wheeler WC, Martindale MQ, Giribet G (2008) Broad phylogenomic sampling improves resolution of the animal tree of life. Nature 452:745–750CrossRefPubMed

Dunn CW, Giribet G, Edgecombe GD, Hejnol A (2014) Animal phylogeny and its evolutionary implications. Annu Rev Ecol Syst 45:371–395CrossRef

Ekblom R, Galindo J (2011) Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity 107:1–15CrossRefPubMed

Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6:e19379CrossRefPubMedPubMedCentral

Etter PD, Preston JL, Bassham S, Cresko WA, Johnson EA (2011) Local de novo assembly of RAD paired-end contigs using short sequencing reads. PLoS One 6:e18561CrossRefPubMedPubMedCentral

Eytan RI, Evans BR, Dornburg A, Lemmon AR, Lemmon EM, Wainwright PC, Near TJ (2015) Are 100 enough? Inferring acanthomorph teleost phylogeny using anchored hybrid enrichment. BMC Evol Biol 15:113CrossRefPubMedPubMedCentral

Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC (2012) Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. Syst Biol 61:717–726CrossRefPubMed

Friis G, Aleixandre P, Rodríguez-Estrella R, Navarro-Sigüenza AG, Milá B (2016) Rapid postglacial diversification and long-term stasis within the songbird genus Junco: phylogeographic and phylogenomic evidence. Mol Ecol 24:6175-6195.

Fu Q, Meyer M, Gao X, Stenzel U, Burbano HA, Kelso J, Pääbo S (2013) DNA analysis of an early modern human from Tianyuan Cave, China. Proc Natl Acad Sci U S A 110:2223–2227CrossRefPubMedPubMedCentral

Futschik A, Schlötterer C (2010) The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186:207–218CrossRefPubMedPubMedCentral

Gardner RC, Howarth AJ, Hahn P, Brown-Luedi M, Shepherd RJ, Messing J (1981) The complete nucleotide sequence of an infectious clone of cauliflower mosaic virus by M13mp7 shotgun sequencing. Nucleic Acids Res 9:2871–2888CrossRefPubMedPubMedCentral

Gawad C, Koh W, Quake SR (2016) Single-cell genome sequencing: current state of the science. Nat Rev Genet 17:175–188CrossRefPubMed

Gilbert S (2013) Developmental biology, 10th edn. Sinauer Associates Inc., Sunderland

Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C (2009) Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 27:182–189CrossRefPubMedPubMedCentral

Gonen S, Lowe NR, Cezard T, Gharbi K, Bishop SC, Houston RD (2014) Linkage maps of the Atlantic salmon (Salmo salar) genome derived from RAD sequencing. BMC Genomics 15:1–17CrossRef

Green ED (2001) Strategies for the systematic sequencing of complex genomes. Nat Rev Genet 2:573–583CrossRefPubMed

Herrera S, Reyes-Herrera PH, Shank TM (2015) Predicting RAD-seq marker numbers across the eukaryotic tree of life. Genome Biol Evol 7:3207–3225CrossRefPubMedPubMedCentral

Houston RD, Davey JW, Bishop SC, Lowe NR, Mota-Velasco JC, Hamilton A, Guy DR, Tinch AE, Thomson ML, Blaxter ML, Gharbi K, Bron JE, Taggart JB (2012) Characterisation of QTL-linked and genome-wide restriction site-associated DNA (RAD) markers in farmed Atlantic salmon. BMC Genomics 13:244CrossRefPubMedPubMedCentral

Huang H, Knowles LL (2014) Unforeseen consequences of excluding missing data from next-generation sequences: simulation study of RAD sequences. Syst Biol 65:357–365CrossRefPubMed

International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921CrossRef

Jones MR, Good JM (2016) Targeted capture in evolutionary and ecological genomics. Mol Ecol 25:185–202CrossRefPubMed

Labonte JM, Swan BK, Poulos B, Luo H, Koren S, Hallam SJ, Sullivan MB, Woyke T, Eric Wommack K, Stepanauskas R (2015) Single-cell genomics-based analysis of virus-host interactions in marine surface bacterioplankton. ISME J 9:2386–2399CrossRefPubMedPubMedCentral

Lam ET, Hastie A, Lin C, Ehrlich D, Das SK, Austin MD, Deshpande P, Cao H, Nagarajan N, Xiao M, Kwok P-Y (2012) Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 30:771–776CrossRefPubMed

Lee JH, Daugharthy ER, Scheiman J, Kalhor R, Yang JL, Ferrante TC, Terry R, Jeanty SSF, Li C, Amamoto R, Peters DT, Turczyk BM, Marblestone AH, Inverso SA, Bernard A, Mali P, Rios X, Aach J, Church GM (2014) Highly multiplexed subcellular RNA sequencing in situ. Science 343:1360–1363CrossRefPubMedPubMedCentral

Lee H, Gurtowski J, Yoo S, Nattestad M, Marcus S, Goodwin S, McCombie W, Schatz M (2016) Third-generation sequencing and the future of genomics. BioRxiv. http://dx.doi.org/10.1101/048603

Lemmon EM, Lemmon AR (2013) High-throughput genomic data in systematics and phylogenetics. Annu Rev Ecol Syst 44:99–121CrossRef

Lemmon AR, Emme SA, Lemmon EM (2012) Anchored hybrid enrichment for massively high-throughput phylogenomics. Syst Biol 61:727–744CrossRefPubMed

Lepais O, Weir JT (2014) SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Mol Ecol Resour 14:1314–1321CrossRefPubMed

Li C, Hofreiter M, Straube N, Corrigan S, Naylor GJP (2013) Capturing protein-coding genes across highly divergent species. BioTechniques 54:321–326CrossRefPubMed

Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, Amit I, Lajoie BR, Sabo PJ, Dorschner MO, Sandstrom R, Bernstein B, Bender MA, Groudine M, Gnirke A, Stamatoyannopoulos J, Mirny LA, Lander ES, Dekker J (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326:289–293CrossRefPubMedPubMedCentral

Lipshutz RJ, Fodor SPA, Gingeras TR, Lockhart DJ (1999) High density synthetic oligonucleotide arrays. Nat Genet 21:20–24CrossRefPubMed

Liu Y, Zhou J, White KP (2014) RNA-seq differential expression studies: more sequence or more replication? Bioinformatics 30:301–304CrossRefPubMed

Liu S, Wang X, Xie L, Tan M, Li Z, Su X, Zhang H, Misof B, Kjer KM, Tang M, Niehuis O, Jiang H, Zhou X (2016) Mitochondrial capture enriches mito-DNA 100 fold, enabling PCR-free mitogenomics biodiversity analysis. Mol Ecol Resour 16:470–479CrossRefPubMed

Macaulay IC, Haerty W, Kumar P, Li YI, Hu TX, Teng MJ, Goolam M, Saurat N, Coupland P, Shirley LM, Smith M, Van der Aa N, Banerjee R, Ellis PD, Quail MA, Swerdlow HP, Zernicka-Goetz M, Livesey FJ, Ponting CP, Voet T (2015) G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat Methods 12:519–522CrossRefPubMed

Mahairas GG, Wallace JC, Smith K, Swartzell S, Holzman T, Keller A, Shaker R, Furlong J, Young J, Zhao S, Adams MD, Hood L (1999) Sequence-tagged connectors: a sequence approach to mapping and scanning the human genome. Proc Natl Acad Sci U S A 96:9739–9744CrossRefPubMedPubMedCentral

Malone JH, Oliver B (2011) Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol 9:34CrossRefPubMedPubMedCentral

Marra MA, Kucaba TA, Dietrich NL, Green ED, Brownstein B, Wilson RK, McDonald KM, Hillier LW, McPherson JD, Waterston RH (1997) High throughput fingerprint analysis of large-insert clones. Genome Res 7:1072–1084CrossRefPubMedPubMedCentral

McCormack JE, Hird SM, Zellmer AJ, Carstens BC, Brumfield RT (2013) Applications of next-generation sequencing to phylogeography and phylogenetics. Mol Phylogenet Evol 66:526–538CrossRefPubMed

McGee MD, Faircloth BC, Borstein SR, Zheng J, Darrin Hulsey C, Wainwright PC, Alfaro ME (2016) Replicated divergence in cichlid radiations mirrors a major vertebrate innovation. Proc R Soc Lond B Biol Sci 283:20151413CrossRef

McIntyre LM, Lopiano KK, Morse AM, Amin V, Oberg AL, Young LJ, Nuzhdin SV (2011) RNA-seq: technical variability and sampling. BMC Genomics 12:293CrossRefPubMedPubMedCentral

Mendelowitz L, Pop M (2014) Computational methods for optical mapping. Gigascience 3:33CrossRefPubMedPubMedCentral

Meyer M, Fu Q, Aximu-Petri A, Glocke I, Nickel B, Arsuaga J-L, Martinez I, Gracia A, de Castro JMB, Carbonell E, Paabo S (2014) A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505:403–406CrossRefPubMed

Michaeli Y, Ebenstein Y (2012) Channeling DNA for optical mapping. Nat Biotechnol 30:762–763CrossRefPubMed

Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5:621–628CrossRefPubMed

Mozo T, Dewar K, Dunn P, Ecker JR, Fischer S, Kloska S, Lehrach H, Marra M, Martienssen R, Meier-Ewert S, Altmann T (1999) A complete BAC-based physical map of the Arabidopsis thaliana genome. Nat Genet 22:271–275CrossRefPubMed

Orlando L, Ginolhac A, Zhang G, Froese D, Albrechtsen A, Stiller M, Schubert M, Cappellini E, Petersen B, Moltke I, Johnson PLF, Fumagalli M, Vilstrup JT, Raghavan M, Korneliussen T, Malaspinas A-S, Vogt J, Szklarczyk D, Kelstrup CD, Vinther J, Dolocan A, Stenderup J, Velazquez AMV, Cahill J, Rasmussen M, Wang X, Min J, Zazula GD, Seguin-Orlando A, Mortensen C, Magnussen K, Thompson JF, Weinstock J, Gregersen K, Roed KH, Eisenmann V, Rubin CJ, Miller DC, Antczak DF, Bertelsen MF, Brunak S, Al-Rasheid KAS, Ryder O, Andersson L, Mundy J, Krogh A, Gilbert MTP, Kjaer K, Sicheritz-Ponten T, Jensen LJ, Olsen JV, Hofreiter M, Nielsen R, Shapiro B, Wang J, Willerslev E (2013) Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499:74–78CrossRefPubMed

Oshlack A, Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14CrossRefPubMedPubMedCentral

Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012) Double Digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One 7:e37135CrossRefPubMedPubMedCentral

Poland JA, Brown PJ, Sorrells ME, Jannink J-L (2012) Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS One 7:e32253CrossRefPubMedPubMedCentral

Prum RO, Berv JS, Dornburg A, Field DJ, Townsend JP, Lemmon EM, Lemmon AR (2015) A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature 526:569–573CrossRefPubMed

Puritz JB, Matz MV, Toonen RJ, Weber JN, Bolnick DI, Bird CE (2014) Demystifying the RAD fad. Mol Ecol 23:5937–5942CrossRefPubMed

Putnam NH, O’Connell B, Stites JC, Rice BJ, Fields A, Hartley PD, Sugnet CW, Haussler D, Rokhsar DS, Green RE (2016) Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res 26:342–350CrossRefPubMedPubMedCentral

Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D (2013) Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol 14:1–13CrossRef

Roberts RJ, Vincze T, Posfai J, Macelis D (2015) REBASE—a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res 43:D298–D299CrossRefPubMed

Ruane S, Raxworthy CJ, Lemmon AR, Lemmon EM, Burbrink FT (2015) Comparing species tree estimation with large anchored phylogenomic and small Sanger-sequenced molecular datasets: an empirical study on Malagasy pseudoxyrhophiine snakes. BMC Evol Biol 15:1–14CrossRef

Satija R, Farrell JA, Gennert D, Schier AF, Regev A (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33:495–502CrossRefPubMedPubMedCentral

Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467–470CrossRefPubMed

Schuler DG (1997) Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J Mol Med 75:694–698CrossRefPubMed

Schulze A, Downward J (2001) Navigating gene expression using microarrays – a technology review. Nat Cell Biol 3:E190–E195CrossRefPubMed

Schwarz A, Cabezas-Cruz A, Kopecky J, Valdes JJ (2014) Understanding the evolutionary structural variability and target specificity of tick salivary Kunitz peptides using next generation transcriptome data. BMC Evol Biol 14

Shapiro E, Biezuner T, Linnarsson S (2013) Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat Rev Genet 14:618–630CrossRefPubMed

Shizuya H, Birren B, Kim UJ, Mancino V, Slepak T, Tachiiri Y, Simon M (1992) Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc Natl Acad Sci U S A 89:8794–8797CrossRefPubMedPubMedCentral

Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15:1034–1050CrossRefPubMedPubMedCentral

Smith HO, Welcox KW (1970) A restriction enzyme from Hemophilus influenzae: I. Purification and general properties. J Mol Biol 51:379–391CrossRefPubMed

Soderlund C, Longden I, Mott R (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput Appl Biosci CABIOS 13:523–535PubMed

Stepanauskas R (2012) Single cell genomics: an individual look at microbes. Curr Opin Microbiol 15:613–620CrossRefPubMed

Studier FW, Moffatt BA (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol 189:113–130CrossRefPubMed

Swan BK, Martinez-Garcia M, Preston CM, Sczyrba A, Woyke T, Lamy D, Reinthaler T, Poulton NJ, Masland EDP, Gomez ML, Sieracki ME, DeLong EF, Herndl GJ, Stepanauskas R (2011) Potential for chemolithoautotrophy among ubiquitous bacteria lineages in the dark ocean. Science 333:1296–1300CrossRefPubMed

Tang F, Lao K, Surani MA (2011) Development and applications of single-cell transcriptome analysis. Nat Methods 8:S6–11CrossRefPubMedPubMedCentral

Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A (2011) Differential expression in RNA-seq: a matter of depth. Genome Res 21:2213–2223CrossRefPubMedPubMedCentral

Telford MJ, Budd GE, Philippe H (2015) Phylogenomic insights into animal evolution. Curr Biol 25:R876–R887CrossRefPubMed

The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815CrossRef

The C. elegans Sequencing Consortium (1998) Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282:2012–2018CrossRef

Todd EV, Black MA, Gemmell NJ (2016) The power and promise of RNA-seq in ecology and evolution. Mol Ecol 25:1224–1241CrossRefPubMed

Toonen RJ, Puritz JB, Forsman ZH, Whitney JL, Fernandez-Silva I, Andrews KR, Bird CE (2013) ezRAD: a simplified method for genomic genotyping in non-model organisms. Peer J 1:e203CrossRefPubMedPubMedCentral

van Heesch S, Kloosterman WP, Lansu N, Ruzius F-P, Levandowsky E, Lee CC, Zhou S, Goldstein S, Schwartz DC, Harkins TT, Guryev V, Cuppen E (2013) Improving mammalian genome scaffolding using large insert mate-pair next-generation sequencing. BMC Genomics 14:257CrossRefPubMedPubMedCentral

Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Francesco VD, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji R-R, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang ZY, Wang A, Wang X, Wang J, Wei M-H, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu SC, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers Y-H, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigó R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang Y-H, Coyne M, Dahlke C, Mays AD, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X (2001) The sequence of the human genome. Science 291:1304–1351CrossRefPubMed

Vijay N, Poelstra JW, Künstner A, Wolf JBW (2013) Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol 22:620–634CrossRefPubMed

Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63CrossRefPubMedPubMedCentral

Wang Y, Ghaffari N, Johnson CD, Braga-Neto UM, Wang H, Chen R, Zhou H (2011) Evaluation of the coverage and depth of transcriptome by RNA-Seq in chickens. BMC Bioinform 12(Suppl. 10):S5

Wang S, Meyer E, McKay JK, Matz MV (2012) 2b-RAD: a simple and flexible method for genome-wide genotyping. Nat Methods 9:808–810CrossRefPubMed

Weber JL, Myers EW (1997) Human whole-genome shotgun sequencing. Genome Res 7:401–409CrossRefPubMed

Wolf JBW (2013) Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial. Mol Ecol Resour 13:559–572CrossRefPubMed

Yilmaz S, Singh AK (2012) Single cell genome sequencing. Curr Opin Biotechnol 23:437–443CrossRefPubMed

Zheng W-X, Zhang C-T (2008) Ultraconserved elements between the genomes of the plants Arabidopsis thaliana and Rice. J Biomol Struct Dyn 26:1–8CrossRefPubMed

Prev Next