Inferring positive selection in humans from genomic data

Adaptation can be described as an evolutionary process that leads to an adjustment of the phenotypes of a population to their environment. In the classical view, new mutations can introduce novel phenotypic features into a population that leave footprints in the genome after fixation, such as selective sweeps. Alternatively, existing genetic variants may become beneficial after an environmental change and increase in frequency. Although they may not reach fixation, they may cause a shift of the optimum of a phenotypic trait controlled by multiple loci. With the availability of polymorphism data from various organisms, including humans and chimpanzees, it has become possible to detect molecular evidence of adaptation and to estimate the strength and target of positive selection. In this review, we discuss the two competing models of adaptation and suitable approaches for detecting the footprints of positive selection on the molecular level.


Introduction
Understanding the genetic architecture and evolution of phenotypes that are present in populations adapting to heterogeneous environments has been a long-standing interest in evolutionary biology [1][2][3]. This question has been studied by means of quantitative genetics and population genetics. Quantitative genetics provides the methods to describe differences in the distribution of phenotypes, determine their heritability and map relevant regions controlling the phenotype in the genome [4]. In contrast, population genetics provides a framework to describe changes of allele frequencies that are known to be mostly determined by genetic drift [5] and selection [6]. The latter field produced a profound theory about the process of adaptation on the sequence level [7], which gave rise to an abundance of population genetic tools that can be applied to genetic data independent of phenotypes [8][9][10][11][12][13][14].
Results from genome-wide scans, however, were often inconclusive [8,35,36]. The lack of reproducibility has been attributed to an insufficient power of the tests [37], the presence of masking signals of positive selection by purifying selection (for example, [38]) or complex demographic histories (for example, [39]). Furthermore, the classical model of adaptation in which single new mutations with large effects are favoured by recent positive selection has been questioned [40,41]. And the problem was raised whether evidence for more general models of adaptation (in particular those involving quantitative genetic variation) could be detected on the genomic level [39,40]. This latter issue became particularly interesting in the face of an influx of huge amounts of data from genome-wide association studies [42,43].
In this review, we summarize the population genetic and quantitative genetic models of adaptation and describe the methods to detect the footprints of adaptation in the genome. Furthermore, we provide examples of adaptation in humans that illustrate these theoretical accomplishments.

Population genetic models of adaptation
Genetic adaptation is the result of fitness differences of alleles. Consider the alleles a and A at a bi-allelic locus in diploid organisms as mutant and wild type, respectively. A fitness value may be assigned to each possible genotype aa, aA and AA. Mutations are neutral if the fitness effects are equal (that is, w aa = w aA = w AA ), which is the case for most of the genetic variation observed in humans [44]. In the classical model, positive selection occurs when the derived allele has a higher fitness than the ancestral allele, and negative (or purifying) selection, when the derived allele is detrimental to the organism. Balancing selection occurs in the case of heterozygote advantage and in situations of spatial and temporal heterogeneity of selection. Nucleotide changes in the DNA sequence may have some direct or indirect effect on the phenotype of the individual that generates a fitness advantage or disadvantage and hence are assumed to occur in coding regions of genes or regulatory sequences [45].
Expected patterns of positive selection in the genome: A beneficial mutation may rise quickly under positive selection. If the beneficial allele is going to fixation, genetic hitchhiking [46,47] results in depletion of variation around the selected site, also termed as selective sweep [46,47]. If the beneficial allele has not yet reached fixation, a sweep is called incomplete, partial, or ongoing. Sweep patterns that arise from a newly introduced mutation or migrant are considered as a 'hard sweep'. If, however, the beneficial allele arises from standing variation, that is, after an environment change, the pattern of nucleotide polymorphism generated after fixation of the beneficial allele is called 'soft sweep' [48][49][50]. In this latter model, adaptation is not limited to the occurrence of new mutations and can therefore occur more rapidly after an environmental change [49]. The resulting pattern of variation of a soft sweep becomes very similar to that of a hard sweep in case the initial frequency of the beneficial allele is low. This situation may occur if the allele is initially in a mutation-selection balance and becomes positively selected after an environmental shift [46,47].
The genomic signatures of recent adaptation can be measured by means of the site frequency spectrum (SFS), which summarizes the counts of derived variants in a region. Under the action of positive directional selection, the SFS exhibits an excess of both rare and high-frequency derived variants around the selected site that are present in the population at the time of fixation of the beneficial allele [51,52]. The size of the region with depleted variation is expected to be larger when recombination is low and/ or selection is strong [47,53], if hitchhiking has started from a selected allele with low frequency. The transient phase, until the beneficial mutation reaches fixation in the population, is inversely proportional to the population size [54]. Furthermore, in a subdivided population a frequency shift of a beneficial allele may lead to increased genetic differentiation between subpopulations in comparison to a population that has not been subjected to selection [55]. In its extreme, fixed differences between subpopulations may be observed.
The signature of linkage disequilibrium (LD) around the selected site is another characteristic of the hitchhiking process. LD emerges between pairs of sites due to nonrandom association of alleles. When selection is strong and a sweep is in progress, LD among hitchhiking alleles will strongly increase [56,57], due to limited time for recombination events to occur. However, after the beneficial allele driving hitchhiking has reached an intermediate frequency around 50%, LD between variants across the selected site decreases rapidly and eventually disappears when fixation has occurred. In contrast, the LD between polymorphisms on either side remains high and decreases only slowly. The establishment of the well-known longrange haplotypes in a population [21] is a consequence of the strong LD around the selected site in the first half of the selective phase (until the beneficial allele reaches intermediate frequency). Therefore, these extended haplotypes can be used to detect incomplete (ongoing) sweeps that are typical for humans [9,36]. The use of LD has the advantage that it is relatively robust against purifying selection [38].
Once a beneficial mutation has been fixed in a population the signature of linkage disequilibrium decreases and the pattern of polymorphism in the neighbourhood can be restored. The time range to detect these LD signatures of recent adaptation in a single population is rather limited (for example, in the scale of 10,000 years in the case of humans [36]) and measurable only when adaptation is still ongoing or has only recently ceased. The fixed differences between populations or species remain evident much longer (millions of years, humans compared to chimp, [36]).
These latter genomic signatures of positive selection, however, may not be unique. It is known that population expansion as well as sudden decreases in population size (bottlenecks) can result in similar genomic patterns, such as an excess of rare and intermediate-frequency derived variants, respectively [58,59]. For example, while human populations were migrating out of Africa, consecutive population bottlenecks followed by population expansion occurred [60,61]. Such a cascade of demographic events is expected to leave patterns in the genome that are very similar to the one of selective sweeps [62].
Furthermore, population structure can mimic the signature of balancing selection [63]. The inference of the demographic history of a population can in addition be confounded by the genotyping technology leading to single nucleotide polymorphism (SNP) ascertainment bias [64,65]. Choosing SNPs from a too small discovery sample for genotyping can skew the resulting site frequency spectrum toward intermediate frequencies.
Alternative modes of selection may also result in similar signatures as those produced by positive directional selection. In particular, background selection may also lead to a depletion of variation [66,67], yet without causing shifts of low-and high-frequency-derived variants in the SFS. This signature may resemble that of multiple selective sweeps (recurrent sweeps; [68]) and may result in a lack of high-frequency-derived variants [69]. Selective sweeps may also be difficult to distinguish from recombination hotspots [70]. If recombination is strong, the region of depleted variation may become too small to be recognized. In contrast, a recombination cold spot can generate a pattern of increased LD that is similar as the pattern of a sweep in progress [71]. Furthermore, varying recombination rate on a fine scale may also confound the long-range haplotype signature of sweeps.
A common statistical approach dealing with these difficulties is to derive a likelihood by comparing a statistical null model that includes all the aforementioned nonselective effects to an alternative model that in addition contains positive selection. Many of the confounding factors, however, are difficult to model jointly in a likelihood framework. In an alternative approach, summary statistics are constructed that quantify specific patterns of selective forces and are applied genome-wide. Regions with the strongest signals are considered as outliers. Statistical significance is then assessed by simulating a null model using the coalescent [72]. In the following, we review statistical approaches and their applications taking these confounding effects into account.
Statistical tests to detect deviations from neutrality: Several tests have been developed that make use of the aforementioned signatures of hitchhiking, that is, the reduction of genetic variation, the skew in the frequency spectrum and the pattern of linkage disequilibrium. These tests may be broadly categorized into three classes: (i) tests that use only data from one population, (ii) tests that compare genetic signatures among multiple populations and (iii) comparative tests that use a closely related species as an out-group. The tests can be further classified into model-free and model-based methods. The latter use the neutral theory [5] to build the null hypothesis and can be applied to compare single candidate regions to a neutral expectation, when full genome data is not available. In contrast, model-free methods try to quantify the characteristic signatures of hitchhiking and are usually applied in an outlier approach to genomewide data. Regions that show the strongest signals are assumed to be candidates for sweeps [8,10,73].
The most widely used method in the first class of tests is Tajima's D statistic [74] that compares the number of segregating sites to an expected value when the population size is assumed to be constant over time (standard neutral model). Large positive values indicate an excess of variation in the tested region that could be due to balancing selection, whereas negative values indicate a depletion of variation due to positive directional selection. The interpretation of the Tajima's D statistic, however, may be ambiguous as the demographic history of a population needs to be taken into account. Therefore, several more recent developments corrected Tajima's D statistic, for instance, by including population size changes [75] or SNP ascertainment bias [76] that can arise from genotyping technology [64].
The Fay and Wu's H test [52] uses, in addition, data from an out-group species to get information of the ancestral state of a polymorphism and detect selective sweeps by an excess of high-frequency-derived polymorphisms. In contrast, the Fu and Li's D statistic [77] takes advantage of low-frequency variation that is enriched in regions that recently underwent genetic hitchhiking. The maximum frequency of derived mutations (MFDM) test [78] utilizes the MFDM to estimate the presence of an unbalanced tree topology in a given sample that is thought to arise in the adjacency of a locus that is under positive selection due to hitchhiking [46,52]. In line with coalescent theory, the tree topology is independent of changes in population size, which makes the MFDM statistics evidently robust against demographic events, such as bottlenecks or expansions [78]. To obtain good estimates for the MFDM statistics, large sample sizes of at least 42 chromosomes (21 diploids) are necessary [78] that have to be unaffected by migration, admixture or any hidden population substructure.
A statistic that uses the full site frequency spectrum has been introduced by Kim and Stephan [54]. Here, a composite likelihood ratio (CLR) is calculated by multiplying the probabilities of all polymorphic sites of a genomic region, which makes it possible to estimate the strength and location of a selective sweep. The method returns a likelihood of a complete sweep compared to a population that evolves under standard neutrality, and an estimate about the selection parameter and the target of selection. This test has been further developed by Nielsen et al. [8] to detect deviations from a background spectrum that includes deviations from neutrality due to demographic history and SNP ascertainment bias under the assumption that the selective sweep has been completed. A demographic model consisting of two epochs of population sizes has been incorporated into the CLR approach by Williamson et al. [31]. Finally, LD has been combined with this composite likelihood framework by Pavlidis et al. [79], which is reducing the number of false positives. Currently, the most advanced CLR-based test is SweeD [80] that includes a demographic model with an arbitrary number of instantaneous changes in population size [81]. The power of this test increases with up to a sample size of about 500.
A large fraction of model-free tests are also based on the patterns of LD. Many tests take advantage of the haplotype homozygosity as introduced by Sabeti et al. [21], which is a measure of genetic diversity regarding multiple polymorphic sites [82]. The decay of the extended haplotype homozygosity (EHH) as calculated step by step from a defined core haplotype was designed as a test for positive selection. This test, however, cannot easily distinguish between complete and incomplete sweeps. Several modifications of the EHH test statistic have been introduced that account for the confounding effect of varying recombination rates. The relative extended haplotype homozygosity (REHH) is defined to be the extended homozygosity of a core haplotype divided by the homozygosity of the remaining core haplotypes combined [83]. The integrated haplotype score (iHS) as proposed by Voight et al. [22] compares the decay of the ancestral allele against the derived allele. If the derived allele is beneficial, its underlying haplotype will take longer to decay than the ancestral one. While this test cannot be applied to sites that are already fixed, it is useful to detect recent sweeps that are still in progress (that is, incomplete sweeps). As the latter mentioned tests do not compare the observation with a theoretical expectation, they are mostly used in a statistical outlier approach.
The second class of tests compares recently diverging populations under the assumption that adaptation was acting differently on the populations. A test for detecting differentiation in allele frequencies between populations by means of Wright's fixation index F ST [84] has been first formulated by Lewontin and Krakauer [85]. This idea has been incorporated into various frequency-and LD-based test statistics. The CLR approach has been extended by Chen et al. [86]. It models population structure by multi-locus allele frequency differentiation between two populations (XP-CLR). However, population size changes and associations between polymorphic sites were not considered in the model. The model-free Rsb measure proposed by Tang et al. [29] compares the haplotype homozygosity decay at homologous sites between two populations that diverged recently. Similarly, the XP-EHH method [83] compares the homozygosity decay among different populations. The latter tests take advantage of the assumption that local adaptation increases population differentiation compared to neutrally evolving subpopulations. Another extension of measuring population differentiation between populations on a haplotype level is a method proposed by Fariello et al. [87] and Ferrer-Admetlla et al. [88] that has been shown to have more power to detect soft sweeps over SFS-based methods [88]. A combination of class one and class two tests has been proposed in [89]. The composite of multiple signals (CMS) test combines the different priors of detecting extended haplotypes (XP-EHH, iHS), high-frequency-derived alleles (iHS), and polymorphic sites that exhibit population differentiation and results in a score that represents a posterior probability that a certain variant is under selection [89].
The third class of tests uses the information of an outgroup species to detect selection. Most widely used is the dN/dS ratio, also known as Ka/Ks statistic [90]. The basic idea is that the ratio of non-synonymous and synonymous substitutions is close to one under neutrality. The Hudson-Kreitmann-Aquadé test (HKA, [91]) compares polymorphisms within species by means of Watterson's estimator [92] and divergence between species across two or more loci. Under neutrality, they are expected to be identical, which is tested by means of a goodness of fit test. In contrast, the McDonald-Kreitman test compares polymorphism within populations and divergence between species at single loci for two classes of sites (for example, synonymous and non-synonymous sites) [93].

Quantitative genetic models of adaptation
Quantitative genetic models of adaptation date back to the time before the genetic mechanisms of inheritance were fully discovered [1,94]. Quantitative phenotypes in a population are characterized by a distribution of gradual differences among individuals that are controlled by a multitude of genes. In varying environments, different phenotypes may be favoured. This leads to a change in the population mean phenotype that is known to depend on the additive genetic variation present in the population. When a population deviates from its optimum, mutations are favoured according to their effect size and distance to the optimum. The mean step size of such an adaptive walk has been shown to be approximately exponentially distributed [1]; that is, alleles with larger effects are favoured when the population resides far from the optimum, whereas alleles with smaller effects are favoured during the adaptive fine-tuning close to the population optimum.
The impact of beneficial mutations in the process of adaptation depends on the mutation rate and population size [95]. In humans, the most non-synonymous mutations have been shown to be neutral (27% to 29% [33]) or mildly deleterious (30% to 42% [31,33]). In comparison with chimpanzees, 10% to 20% of the fixations appear to be adaptive [33]. However, beneficial mutations that lead to fixation in recent time have been shown to be rare (1% [96]), so that adaptation from standing variation may be the most important mode of recent adaptation.
In this scenario, classical selective sweeps play only a role if the beneficial alleles are driven to fixation from low frequency by strong selection [40,97]. Instead, small frequency shifts of selected alleles at the quantitative trait loci driving a trait value towards its optimum may occur predominantly.
In case the trait optima of populations are ordered along clines [98,99], effective alleles are expected to change in frequency accordingly [40]. This may be detected by means of the Lewontin and Krakauer test [85] and other F ST -based statistics (for example, [100]). To be able to distinguish these adaptive frequency changes from drift, Coop et al. [101] proposed a model that analyses whether allele frequencies correlate with environmental variables along a population gradient. A test for polygenic adaptation that also incorporates estimates of phenotypic values from genome-wide association data and compares those with environmental variables has been recently introduced by Berg and Coop [102]. However, phenotypic and genotypic data for many populations are required for this test.

Evidence for adaptation in humans
As the migration out of Africa [103] and the settlement around the world exposed humans to different environmental conditions with regard to temperature, amount of light, humidity, oxygen levels, and agriculture [104], many adaptations in non-African populations must have occurred in the recent past [105]. In line with this, positive selection has been shown to be a less important determinant in various African populations [106]. The most accepted examples from different genome scans show human adaptations to (i) agriculture [104], (ii) environmental variables, such as amount of light, temperature, or oxygen levels, and (iii) pathogen resistance [107][108][109].
The most prominent example of adaptation in humans to agriculture is the ability to digest lactose from milk products in adulthood [110]. Indeed, an extended haplotype homozygosity as a signature for a selective sweep around the LCT gene was observed [22,83]. The activity of the LCT gene is usually reduced in adult mammals [110]. However, the presence of the beneficial mutation provides a selective advantage of about 1.4% to 19% [111]. The most likely explanation for the evolutionary advantage of the mutation is the additional caloric and calcium source it produces because it reduces the risk for diseases related to bone mineralization caused by a lack of vitamin D [110,112]. The frequency of the allele associated with lactase persistence has been shown to decline from Northwest Europe to the southern populations [110] and the mutation is absent in African populations. In African rural, populations show strong evidence for parallel adaptation to digest lactose from milk products. Other alleles have been associated with lactase persistence [113] that show similar LD patterns and high selection coefficients of 4% to 9% [113,114]. Skin pigmentation is another example of adaptation to environmental conditions in humans. It is known to be controlled by the amount of eumelanin and pheomelanin that are produced in the melanosomes [115,116]. The dark pigmented skin is assumed to be ancestral, whereas lighter pigmented skin has emerged after the migration out of Africa [117]. Skin colour has long been speculated to evolve under positive selection and is another example for convergent evolution [115,117,118]. Many genes have been shown to be associated with variation in skin colour in different human populations [115,116]. The MC1R gene is a main switch in the production of the lighter pheomelanin and darker eumelanin pigments in the melanosomes [116]. Strong selection for the persistence of the dark pigment has been found in African [119] and southern European populations [120]. The gene SLC24A5 regulates calcium levels in melanosomes and has been associated with lighter pigmentation in Europeans [121]. In genome-wide scans, it has been shown that SLC24A5 is surrounded by a region of decreased variability and increased LD levels [22,23,83,117] and is substantially differentiated among different populations [23,105,122]. In East Asian populations, another candidate gene, OCA2, has been shown to be subject to positive selection [118]. Furthermore, there are several other candidate genes, such as UGT1A and BNC2 that are associated with skin pigmentation [123]. However, an adaptive signature has not been observed for these genes yet, most likely due to lower effect sizes so that the establishment of a sweep signature and/or frequency changes become too small to be identified.
Human height is a classical quantitative trait that has been studied since the beginning of the last century [124][125][126] and shows evidence for phenotypic adaptation to different environmental factors, such as temperature (for example, Bergmans rule [40]), with extreme differences among populations of up to 30 cm [127]. More than 180 loci have been associated with it [128], with no evidence of selective sweeps so far. Turchin et al. [129] demonstrated that alleles that contribute to a tall stature are enriched in northern European populations, which is better explained by small selection coefficients of 0.001% to 0.1% than drift. Since human height can be expected to be under stabilizing selection [115,116], the probability of observing selective sweeps is rather low [97].
Another example of parallel adaptation to low oxygen levels in high altitude has been described in Tibetan, Andean, and Ethiopian populations. Tibetans and Ethiopians adapted differently to the low oxygen levels compared to Andeans [130]. Andeans show an increased haemoglobin blood concentration that elevates the oxygen transport in blood, whereas Tibetans and Ethiopians exhibit an increased lung capacity and breathing rate [130]. The EPAS1 and EGLN1 genes show strong signatures of selective sweeps in Tibetans; that is, an increased differentiation in allele frequency compared to East Asian populations and an increased LD [131][132][133]. Variants of the EPAS1 and EGLN1 genes have been associated with haemoglobin concentration levels in blood [134,135]. It has been shown that the EPAS1 gene has likely been introgressed from an archaic human, the Denisovans into the Tibetans [136]. In the Andean population, different genes (NOS2A and PRKAA1) have been identified as targets of adaptation [131,137]. Ethiopian high-altitude populations that have a similar phenotype as Tibetan populations, also show a different set of genes (CBARA1, VAV3, ARNT2 and THRB), with evidence for positive selection [138]. Variants associated with haemoglobin variation in Tibetans do not overlap with variants associated in Ethiopians [139].

Conclusions
Classical sweeps have been shown to be rare in humans [13,96,105] and, if they exist, they occur around loci with large-effects alleles. As selective sweeps are rare in humans (in contrast to species with large effective sizes such as Drosophila), the emphasis of human population genetics in the near future must be to identify adaptive signatures for polygenic phenotypes. There is an urgent need for more theoretical modelling and better statistical methods to analyse the evolution of polygenic traits for populations of varying environments and demographies.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions AW and WS equally participated in drafting the manuscript. Both authors read and approved the final manuscript.