Principles of reference alignment and de novo assembly. (a) Alignment of paired-end reads to two chromosomes of a reference genome. Arrows with the same color indicate reads that belong to the same pair. Red arrows illustrate a normal pair, aligning with the expected orientation and distance. Green arrows illustrate a pair that aligns at a larger distance than expected due to a potential deletion in the sequenced genome. Orange arrows illustrate a pair that aligns to different chromosomes indicating a potential rearrangement in the sequenced genome. Blue arrows illustrate how paired-end reads can guide alignment if one of the reads aligns in a repeated (grey) region. (b) De novo assembly of paired-end reads without the guidance of a reference. Overlapping reads (arrows) are assembled into clusters, and the consensus sequence of each cluster is called a contig. Reads of the same pair that belong to different contigs (red arrows) can help to order contigs into scaffolds. Because the average size of the original fragments is known, the size of the gap between the contigs can be estimated.