Contrasting signals of positive selection in genes involved in human skin-color variation from tests based on SNP scans and resequencing
- Johanna Maria de Gruijter†1, 2,
- Oscar Lao†1Email author,
- Mark Vermeulen1,
- Yali Xue3,
- Cara Woodwark3,
- Christopher J Gillson3,
- Alison J Coffey3,
- Qasim Ayub3,
- S Qasim Mehdi4,
- Manfred Kayser†1 and
- Chris Tyler-Smith†3
© de Gruijter et al; licensee BioMed Central Ltd. 2011
Received: 27 September 2011
Accepted: 1 December 2011
Published: 1 December 2011
Numerous genome-wide scans conducted by genotyping previously ascertained single-nucleotide polymorphisms (SNPs) have provided candidate signatures for positive selection in various regions of the human genome, including in genes involved in pigmentation traits. However, it is unclear how well the signatures discovered by such haplotype-based test statistics can be reproduced in tests based on full resequencing data. Four genes (oculocutaneous albinism II (OCA2), tyrosinase-related protein 1 (TYRP1), dopachrome tautomerase (DCT), and KIT ligand (KITLG)) implicated in human skin-color variation, have shown evidence for positive selection in Europeans and East Asians in previous SNP-scan data. In the current study, we resequenced 4.7 to 6.7 kb of DNA from each of these genes in Africans, Europeans, East Asians, and South Asians.
Applying all commonly used neutrality-test statistics for allele frequency distribution to the newly generated sequence data provided conflicting results regarding evidence for positive selection. Previous haplotype-based findings could not be clearly confirmed. Although some tests were marginally significant for some populations and genes, none of them were significant after multiple-testing correction. Combined P values for each gene-population pair did not improve these results. Application of Approximate Bayesian Computation Markov chain Monte Carlo based to these sequence data using a simple forward simulator revealed broad posterior distributions of the selective parameters for all four genes, providing no support for positive selection. However, when we applied this approach to published sequence data on SLC45A2, another human pigmentation candidate gene, we could readily confirm evidence for positive selection, as previously detected with sequence-based and some haplotype-based tests.
Overall, our data indicate that even genes that are strong biological candidates for positive selection and show reproducible signatures of positive selection in SNP scans do not always show the same replicability of selection signals in other tests, which should be considered in future studies on detecting positive selection in genetic data.
List of abbreviations
approximate Bayesian computation
Brahui from Pakistan
Council for Education on Public Health
Han Chinese from Beijing
Encyclopedia of DNA Elements (ENCODE) data
extended haplotype homozygosity
Japanese from Tokyo
long-range haplotype test
Markov chain Monte Carlo
out of Africa
polymerase chain reaction
Yoruba from Ibadan, Nigeria.
Large-scale genotyping projects using genome-wide single-nucleotide polymorphisms (SNPs) have provided large amounts of data describing the genetic diversity of human populations [1–6]. Several statistical methods have been developed and used for detection of signatures of selective processes from genome-wide SNP data, which we refer to as 'SNP scans' . All these approaches try to recover fingerprints of selective sweeps by detecting signals in the haplotypic variation of a genomic region and/or the spectrum of the variation of the genetic diversity [8–15]. However, the results obtained with the different test statistics usually show limited overlap (see results from Voight et al.  Wang et al.  and Akey ), therefore, it would be desirable to compare the results from SNP haplotype-based tests using the 'gold standard' of full resequencing data and suitable statistical tests. The most widely used approach involves computing a set of neutrality-test values for sequence data from a particular genomic region, and then estimating the likelihood of such values are under neutrality. This is achieved either by comparing the computed one in the region of interest with the value of the statistic in other regions of the genome (that is, by using empirical distributions ) or with the values obtained under demographic (neutral) simulations ). However, it is also desirable to have an estimate of the selective parameters (s and h) rather than just rejecting the hypothesis of neutrality. The latter could be in principle obtained by applying Approximate Bayesian Computation (ABC) , a statistical technique used to recover the posterior distribution of parameters shaping the statistical model, which is applied when computing the likelihood of the data given the parameters is not possible, but it is possible to simulate data under the model of interest . ABC has proven to be a valuable statistical tool for making inferences about demographic parameters in population genetics. Moreover, it has also been used in estimating selective parameters [20, 21].
Skin pigmentation is an excellent candidate system for investigating positive selection. It is very likely that this trait is under selective pressure, given the biological role of pigmentation and the large differences seen in pigmentation intensity between continental populations . For a large number of genes involved in the pigmentation pathway, signatures of recent selective sweeps have been suggested from SNP-scan data [12, 14, 23–26], and DNA sequence-based evidence has been tested for a limited number of the genes associated with human skin pigmentation (melanocortin 1 receptor (MC1R), solute carrier family 45 member 2 (SLC45A2), tyrosinase-related protein 1 (TYRP1, dopachrome tautomerase (DCT) and tyrosinase (TYR)[27–30]. In a previous SNP-based study, we identified signatures of selective sweeps in the genes oculocutaneous albinism II (OCA2),TYRP1, DCT, and KIT ligand (KITLG) in Europeans, in OCA2, DCT, KITLG, epidermal growth factor receptor (EGFR) and dopamine receptor D2 (DRD2) in East Asians. In contrast, Africans did not show any evidence of positive selection in any of the genes that we tested . Evidence for selection in OCA2, DCT, KITLG, and TYRP1 was also shown by other SNP-based studies applying similar haplotype-based tests to data from other samples [23–26].
In the present study, we generated DNA sequence data from approximately 4.7 to 6.7 kb of each of the four genes OCA2, TYRP1, DCT and KITLG in Africans, Europeans, East Asians and South Asians, and applied both neutrality tests and an ABC-Markov chain Monte Carlo (MCMC) approach for detecting evidence of positive selection. Furthermore, we compared the outcomes from such sequence-based test with our previous results from haplotype-based tests using SNP-scan data.
Diversity statistics estimated for four major pigmentation genes and four worldwide populations
Gene (number of base pairs sequenced)
Sample size, n
Polymorphic sites, n
Nucleotide diversity (×10-4)
Watterson's estimator ()
×10-4 per site
East Asian (CHB)
South Asian (BRU)
East Asian (CHB)
South Asian (BRU)
East Asian (CHB)
South Asian (BRU)
East Asian (CHB)
South Asian (BRU)
Neutrality tests and their statistical significance
Neutrality-test statistics of each gene in each population.a
Fu and Li's
Fay and Wu's H
East Asian (CHB)
East Asian (CHB)
East Asian (CHB)
East Asian (CHB)
Approximate Bayesian computation/Markov chain Monte Carlo analyses
Median and dispersion statistics of the posterior distributions of the demographic parameters
(2400 - 8930)
(130 - 6900)
(1920 - 7650)
(1420 - 8890)
(2160 - 8900)
(11,500 - 63,250)
(50,250 - 76,000)
Median and 95% Credible Interval (CI) of the estimates of the time when selection started
Population under positive selectiona
Time of selection (95% CI)
σ (95% CI)
50,250 (13,500 to 97,250)
361.94 (55.6 to 984.79)
1.70 (0.71 to 4.28)
Europe,b East Asia
45,875 (12,750 to 95,006)
383.42 (36.07 to 992.04)
2.04 (0.78 to 4.56)
Europe,b East Asiab
46,000 (11,750 to 93,269)
385.49 (37.68 to 1036.32)
2.01 (0.74 to 4.41)
48,500 (11,250 to 97,006)
789.24 (55.21 to 1584.98)
1.90 (0.67 to 4.11)
27,500 (10,500 to 73,294)
676.53 (127.73 to 1026.06)
1.10 (0.53 to 2.60)
44,875 (12500 to 95,756.25)
593.18 (47.99 to 1086.22)
1.69 (0.56 to 3.97)
In the present study, we focused on reinvestigating previous conclusions about positive selection based on long-range haplotype (LRH) tests, using four genes putatively associated with human pigmentation. The phylogenetic networks for each gene based on sequence data were in agreement with our previous findings ; the different populations tended to show a high frequency of one of the major haplotypes, which tended to diverge from the others by a large number of mutations, and the single SNP differentiation between populations was also in agreement with previous results. This was particularly evident for OCA2, KITLG and DCT in European and Asian populations, and less evident in the African population. The Pakistani population, geographically situated between the Europeans and Asians, shares the main haplotypes with these two populations. The presence of long network branches within each population can be indicative of balancing selection ; however, we failed to replicate previous LRH findings with the sequence-based tests, and we observed dependence of the statistical significance of the sequence-based tests on which neutrality distribution was used. Only the KITLG in the European population had statistical departures from neutrality in the CM and GM, which is in agreement with the outcome from the LRH test, but neutrality could not be rejected using ENCODE data. Furthermore, we were not able to replicate a previously suggested signal for the DCT gene in Asian populations . As the whole DCT gene is a single linkage disequilibrium (LD) block, as seen in the HapMap East Asian data, it seems unlikely that the discrepancy is explicable by the different DCT regions sequenced. Indeed, the agreement between different SNP-scan studies has been described as 'underwhelming' .
The discrepancies we detected here between haplotype-based and sequence-based test outcomes can be explained by a number of factors.
First, we cannot exclude the possibility that the positive-selection signals from our previous SNP-based study were false positives; the complex demographic history of humans [12, 32, 33] and the power dependency of the tested site  can affect the outcome of such tests.
Second, it has been emphasized that the SNP ascertainment bias introduced during marker discovery [35, 36] and genotyping array can lead to spurious false-positive findings in haplotype-based tests [37, 38].
Third, there might be a lack of power in the sequence-based tests because of the small sample sizes and/or small sequenced regions . Although we cannot exclude this possibility, the length (approximately 5 kb) sequenced from the four genes proved to be sufficient to detect departures from neutrality in SLC45A2 in the European population (data not shown).
A fourth possibility is that the distributions that we computed for each statistic under neutrality do not represent the true underlying distribution for the human species. Parameters of the demographic events need to be defined a priori, which in humans is challenging because of the complex history of migrations, admixture, expansions and bottlenecks . The differences seen in the values of the parameters could be indeed a major source of variation. The ENCODE data we used as an alternative is hampered by the fact that the considered regions were ascertained based on their genomic peculiarities , and they may not be representative of the genetic variability of the genome. There has been progress in resequencing entire genomes (for example, the 1000 Genomes Project; http://www.1000genomes.org/page.php); however, current projects rely on combining low-coverage data from multiple samples, and are not able to produce the accurate sequence for each genome that is needed for such comparisons .
The fifth, and perhaps most likely, reason for discrepancies between LRH and sequence-based tests we observed here may be the different underlying assumptions of the evolutionary models used (that is, instantaneous selective sweep versus incomplete selective sweeps) in the definition of each statistic, and the evolutionary timescale over which each type of test can recover departures from neutrality . In that case, our results might indicate an extremely recent selection in the pigmentation genes, which would be recovered by haplotype-based but not sequence-based tests.
We also used a Bayesian approach to estimate selective parameters of the populations putatively under positive selection in each gene. The demographic parameters were on average in concordance with those described in previous studies [43–46]; however, it should be noted that they were not entirely comparable as there were a large number of differences in the assumptions of the models and data. Despite this technical limitation of the approach, the estimates of the time when selection started and the mode of inheritance correlated well with expectations in the case of SLC45A2, independently of whether the complete 10 kb sequence or a subsample of 5 kb was used (data not shown). To our knowledge, this is the first time this known selective sweep has been quantified in such a way. However, for OCA2, TYRP1, DCT and KITLG and the neutral simulated region, the strong resemblance between the prior and the posterior distributions suggests that the latter are mainly dominated by the priors rather than by the information contained in the genetic data.
In this study, we have shown that using sequence-based neutrality tests to confirm signatures of positive selection derived from LRH tests on SNP-scan data can be difficult, even though there is a strong likelihood that the skin-pigmentation genes we studied have been targets of selection . Deciphering whether this is a consequence of the power of the different statistics to detect the fingerprint of selection on different timescales; different assumptions on the strength of the selective event; or lack of power due to experimental limitations seems challenging.
Our findings should be considered in future studies that set out to further investigate signatures of selection in other genes or regions of the human genome suggested by SNP-scan data. It may be argued that the final proof of positive selection should not be provided by additional genetic data but rather by functional evidence, but this may have its own caveats. A recent study  has demonstrated that the TRPV6 gene shows strong evidence of positive selection in all non-African populations tested using a novel modification of the extended haplotype homozygosity (EHH) test, but no functional differences between the ancestral and derived sequence could be detected using experiments relevant to the known gene function. Clearly, further advancements in the methods used to detect and validate putative signatures of positive selection are needed, and provide one of the most exciting areas for future developments. Complete understanding of positive selection in the human genome will require the combination of multiple lines of evidence.
Genomic DNAs (Coriell Institute for Medical Research; Camden, NJ, USA from randomly ascertained participants from three continents chosen from the HapMap panel  were used: 24 Yoruba from Ibadan, Nigeria (YRI); 24 Han Chinese from Beijing (CHB); and 24 CEPH Utah residents with ancestry from northern and western Europe (CEU). In addition, 26 (OCA2 and TYRP1) or 25 (DCT and KITLG) Brahui (BRU) from Pakistan  were included because of their distinct and intermediate geographic location and history. In some analyses, we used published data on SLC42A2 (Table 3 of Soejima et al.) in European-Africans from Cape Town, Ghanaians from Accra, and Chinese from Guangzhou; these data had excluded SNPs at a frequency of <5%. Genomic DNA samples from one chimpanzee (Pan troglodytes), one orangutan (Pongo pygmaeus) and one gorilla (Gorilla gorilla) (all from the European Collection of Cell Cultures, Salisbury, Wiltshire, UK) were also included to allow inference of the ancestral state of each SNP.
Ascertainment of the sequenced regions
Enzymatic amplification by PCR
Primers were designed from the human reference sequence obtained from GenBank (OCA2 accession number NC_000015.8; TYRP1 accession number NC_000009.10; DCT accession number NC_000013.9 and KITLG accession number NC_000012.9), and used to amplify fragments of approximately 6 to 7 kb (positions on chromosome: OCA2 25959080-25966266; TYRP1 12694505-12700830; DCT 93890084-93895889; KITLG 87482044-87476046) covering the chosen regions. PCR assays were performed using a volume of 25 μl containing approximately 40 ng of genomic DNA, 2 μmol/l of each primer, 200 μmol/l of each dNTP, 2 mmol/l MgSO4, 0.5 U high-fidelity Taq polymerase (Platinum Taq; Invitrogen Corp., Carlsbad, CA, USA). Amplification was performed in a themal cycler (Peltier; MJ Research Inc., Waltham, MA, USA) using the following cycling profile: 94°C for 2 minutes, followed by 35 cycles at 94°C for 30 seconds, 59°C (OCA2, DCT and KITLG) or 56°C (TYRP1) for 1 minute, and 72°C for 1 minute, and a final step at 72°C for 5 minutes. Subsequently, 3 μl samples of a 1:200 dilution of these PCR products were used as templates to reamplify overlapping fragments with sizes of approximately 350 to 700 bp. PCR assays for reamplification were performed in a volume of 15 μl containing 2 μmol/l of each primer, 1.6 mmol/l MgCl2, 200 μmol/l of each dNTP, and 0.5 U Taq (Platinum Taq; Invitrogen). The cycling conditions for the reamplification were 94°C for 2 minutes, followed by 31 cycles at 94°C for 45 seconds, 60°C for 45 seconds, and 72°C for 1.5 minutes; then 72°C for 3 minutes. Details of all primers used in this study are given in Additional File 4.
PCR products were purified (ExoSAP-IT; GE Healthcare, Princeton, NJ, USA) before sequencing. The primers used for reamplification (see Additional File 4) were used (individually) to sequence in both orientations, using a large-scale sequencing facility (Wellcome Trust Sanger Institute) with standard capillary methods. For each individual, each nucleotide position was determined from both strands by at least two reads each. The genomic DNA sequence for OCA2 (accession number NC_000015.8), TYRP1 (accession number NC_000009.10), DCT (accession number NC_000013.9) and KITLG (accession number NC_000012.9) were obtained from GenBank and used as the reference sequence for the relevant gene. Sequence traces were processed using the software program ExoTrace (Wellcome Trust Sanger Institute; http://www.sanger.ac.uk/ humgen/exoseq/analysis.shtml). Potentially polymorphic positions were flagged by the program, and were then checked manually. Variable positions were compared in overlapping and complementary reads for all samples. A quality-control test was performed to verify the SNPs called in each gene by comparing the genotype assignment performed by one investigator with the same genotype assignment performed by a second investigator. This test was performed on 2 kb of sequence per gene. The largest discrepancies were seen in the gene TYRP1, (3.4% discrepancies over all the polymorphic sites examined) and DCT (1.3%); KITLG and OCA2 rates of 0.28% and 0.6%, respectively. All of these discrepancies reflected differences in investigator interpretation, and all could be resolved by re-examining the traces. Therefore, for the rest of the sequenced regions, a maximum number of plausible variants was first identified by the two investigators, and the genotypes were then confirmed by a third investigator. In addition, genotypes of previously identified SNPs were compared with the genotype of the same individual in the HapMap database; 97% of the genotypes corresponded with each other, which is similar to the accuracy of the HapMap data themselves .
Data analysis and coalescent simulations
Haplotypes were reconstructed using PHASE software (version 2.02 (http://depts.washington.edu/uwc4c/express-licenses/assets/phase/) [52, 53]. The four-gamete rule was computed for each pair of loci in each gene. A value of 1 was assigned when the four-gamete rule could not be rejected and 0 when it could. LDheatmap http://stat-db.stat.sfu.ca:8080/statgen/research/LDheatmap/ was used to plot the matrix of loci for each gene. Measures of genetic diversity, including nucleotide diversity (π)  and Watterson's estimator, , were calculated with DnaSP (version 4.10; http://www.ub.edu/dnasp) . Maximum parsimony networks (using the median-joining algorithm) were constructed using the Network 4.1 software package http://www.fluxus-technology.com. Tajima's D ; Fu and Li's D* and F* (no outgroup), and D and F (with outgroup ; Fay and Wu's H ; Fu's Fs ; and Ewens and Watterson homozygosity-test  statistics were estimated using a custom script to automate the computations. To test the reliability of these computations, we first compared the results obtained using our script with those obtained with DnaSP (version 4.10)  for some dummy data files, and saw no discrepancies. The observed values of these neutrality-test statistics in the European, Asian and African populations were compared with the values of the same neutrality-test statistics based on 10,000 coalescent simulations under the best-fit complex demographic history assuming an OOA model (the CM) implemented in the cosi software package (http://www.broadinstitute.org/~sfs/cosi/) . We also estimated the departure of neutrality using the demographic model proposed by Gutenkunst et al. (GM) using the ms software with the syntax described in that paper (, supplementary material). For each statistic, a two-tailed P value was calculated  and for each population and gene, a combined P value was obtained .
Data analysis by comparison with ENCODE data
DNA sequence information from three different regions of 500 kb each, free or almost free of genes, were obtained from the ENCODE project (EP; http://www.genome.gov/10005107) , comprising regions Enr112, Enr113 and Enr213. These regions have been resequenced by the ENCODE project in several samples from the populations (YRI, CEU, CHB, and JPT) present in the HapMap project . We then performed 10,000 re-samples for each population (YRI, CEU, and CHB) and gene by dividing each of the 500 kb regions into bins of size corresponding to the sequenced region of interest, and taking at random the same number of chromosomes per population as sequenced in each of the four genes. For each bin, we computed Tajima's D, Fu and Li's D*, Fu and Li's F*, Fu's Fs and the Ewens-Watterson homozygosity-test statistics, and compared them with those computed in the gene and population of interest. These statistics were preferred over the others because they do not require the ancestral state of each SNP, which is difficult to estimate for some of the SNPs found through the ENCODE project.
Approximate Bayesian computation/Markov chain Monte Carlo
We also implemented an MCMC without likelihoods in a similar way to the algorithm proposed by Marjoram et al. . For details of the implementation of the ABC-MCMC, choice of summary statistics and the forward simulator we have used, see Additional File 1.
To estimate the strength of s and h and the time when selection started in each of the four genes putatively under selection, the SLC45A2 gene (both the full sequenced region and a randomly ascertained region of 5 kb), and the outcome from a neutral simulation of 5 kb with the OOA model (s = 0, h = 0), we used a two-step approach. In the first step, we performed ABC-MCMC using observed data for 50 regions of 10 kb each from the genes sequenced (see Additional file 5) in the same three populations by the Environmental Genome Project , which has previously been used to quantify demographic parameters . Absence of recombination in each 10 kb segment was tested by applying the four-gamete rule in each population. The demographic model we used here is a simplification of the OOA model (Figure 2) . The choice of a simple model in our study was motivated by the computational time of the forward simulations when increasing the complexity of the demographic parameters (such as bottlenecks and population expansions). As the topology of the coalescence tree is defined with a small number of sequences , but estimating the number of differences between pairs of sequences is performed in quadratic time, we only used 10 samples per population in order to minimize the number of computations. All the demographic parameters were scaled by a factor of 10 to reduce computational time  and the mutation rate per nucleotide was thus 2.35 × 10-7. In total, 10,000 simulations were performed, with a burn-in of 1,000 and a thinning of 9. Prior distributions for all the demographic parameters were uniform (Table 3). In the second step, we set as the constant the demographic parameters with the median value from the estimated posterior distributions, and estimated the selective parameters of each of the genomic regions under study, using all the chromosomes per population. For each genomic region, the population under selection was set as the population previously described as being under selective pressure. In all the analyses, prior distributions (without scaling) were: truncated normal (mean ± sd 1810 ± 800 generations, range 400 to 4000) for tsel, truncated normal (mean ± sd -4 ± 3, range -4 to 2) for s in logit form, and truncated normal (mean ± sd 1 ± 1, range 0.5 to 6) for the h parameter. In total, 20,000 simulations were performed with a burn-in of 5,000 and a thinning of 15. Prior distributions of s, h and tsel under selection were obtained by taking a sample from the prior distributions, performing a simulation, and retaining the sampled values if the simulated data contained the mutation under selection. Using this procedure, 1,000 simulations were retained. We repeated the procedure twice, considering CEU and CHB as being under positive or balancing selective pressures depending on the overdominance parameter.
We thank Stefano Mona for stimulating discussions and suggestions about the ABC approach, Mirjam van der Hout for support and computer supplies when running initial implementations of ABC, the Wellcome Trust Sanger Institute large-scale sequencing facility for support in data generation, and Steven Leonard for help with ExoTrace. We also thank the International HapMap project and BRU donors for making their DNA samples available. This study was supported by The Wellcome Trust and by the Erasmus University Medical Center and partially by funds from the Netherlands Forensic Institute.
- International HapMap Consortium: A haplotype map of the human genome. Nature. 2005, 437: 1299-1320. 10.1038/nature04226.
- Hinds DA, Stuve LL, Nilsen GB, Halperin E, Eskin E, Ballinger DG, Frazer KA, Cox DR: Whole-genome patterns of common DNA variation in three human populations. Science. 2005, 307: 1072-1079. 10.1126/science.1105436.View ArticlePubMed
- Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, et al: Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008, 451: 998-1003. 10.1038/nature06742.View ArticlePubMed
- Lao O, Lu TT, Nothnagel M, Junge O, Freitag-Wolf S, Caliebe A, Balascakova M, Bertranpetit J, Bindoff LA, Comas D, et al: Correlation between genetic and geographic structure in Europe. Curr Biol. 2008, 18: 1241-1248. 10.1016/j.cub.2008.07.049.View ArticlePubMed
- Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM: Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008, 319: 1100-1104. 10.1126/science.1153717.View ArticlePubMed
- Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, et al: Genes mirror geography within Europe. Nature. 2008, 456: 98-101. 10.1038/nature07331.PubMed CentralView ArticlePubMed
- Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O, Palma A, Mikkelsen TS, Altshuler D, Lander ES: Positive natural selection in the human lineage. Science. 2006, 312: 1614-1620. 10.1126/science.1124309.View ArticlePubMed
- Kimura R, Fujimoto A, Tokunaga K, Ohashi J: A practical genome scan for population-specific strong selective sweeps that have reached fixation. PLoS ONE. 2007, 2: e286-10.1371/journal.pone.0000286.PubMed CentralView ArticlePubMed
- Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG: Recent and ongoing selection in the human genome. Nat Rev Genet. 2007, 8: 857-868.PubMed CentralView ArticlePubMed
- Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R, et al: Genome-wide detection and characterization of positive selection in human populations. Nature. 2007, 449: 913-918. 10.1038/nature06250.PubMed CentralView ArticlePubMed
- Tang K, Thornton KR, Stoneking M: A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol. 2007, 5: e171-10.1371/journal.pbio.0050171.PubMed CentralView ArticlePubMed
- Voight BF, Kudaravalli S, Wen X, Pritchard JK: A map of recent positive selection in the human genome. PLoS Biol. 2006, 4: e72-10.1371/journal.pbio.0040072.PubMed CentralView ArticlePubMed
- Wang ET, Kodama G, Baldi P, Moyzis RK: Global landscape of recent inferred Darwinian selection for Homo sapiens. Proc Natl Acad Sci USA. 2006, 103: 135-140. 10.1073/pnas.0509691102.PubMed CentralView ArticlePubMed
- Williamson SH, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R: Localizing recent adaptive evolution in the human genome. PLoS Genet. 2007, 3: e90-10.1371/journal.pgen.0030090.PubMed CentralView ArticlePubMed
- Zhang C, Bailey DK, Awad T, Liu G, Xing G, Cao M, Valmeekam V, Retief J, Matsuzaki H, Taub M, et al: A whole genome long-range haplotype (WGLRH) test for detecting imprints of positive selection in human populations. Bioinformatics. 2006, 22: 2122-2128. 10.1093/bioinformatics/btl365.View ArticlePubMed
- Akey JM: Constructing genomic maps of positive selection in humans: where do we go from here?. Genome Res. 2009, 19: 711-722. 10.1101/gr.086652.108.PubMed CentralView ArticlePubMed
- Quach H, Barreiro LB, Laval G, Zidane N, Patin E, Kidd KK, Kidd JR, Bouchier C, Veuille M, Antoniewski C, Quintana-Murci L: Signatures of purifying and local positive selection in human miRNAs. Am J Hum Genet. 2009, 84: 316-327. 10.1016/j.ajhg.2009.01.022.PubMed CentralView ArticlePubMed
- Stefansson H, Helgason A, Thorleifsson G, Steinthorsdottir V, Masson G, Barnard J, Baker A, Jonasdottir A, Ingason A, Gudnadottir VG, et al: A common inversion under selection in Europeans. Nat Genet. 2005, 37: 129-137. 10.1038/ng1508.View ArticlePubMed
- Beaumont MA, Zhang W, Balding DJ: Approximate bayesian computation in population genetics. Genetics. 2002, 162: 2025-2035.PubMed CentralPubMed
- Przeworski M: Estimating the time since the fixation of a beneficial allele. Genetics. 2003, 164: 1667-1676.PubMed CentralPubMed
- Jensen JD, Thornton KR, Andolfatto P: An approximate bayesian estimator suggests strong, recurrent selective sweeps in Drosophila. PLoS Genet. 2008, 4: e1000198-10.1371/journal.pgen.1000198.PubMed CentralView ArticlePubMed
- Parra EJ: Human pigmentation variation: evolution, genetic basis, and implications for public health. Am J Phys Anthropol. 2007, Suppl 45: 85-105.View ArticlePubMed
- Izagirre N, Garcia I, Junquera C, de la Rua C, Alonso S: A scan for signatures of positive selection in candidate loci for skin pigmentation in humans. Mol Biol Evol. 2006, 23: 1697-1706. 10.1093/molbev/msl030.View ArticlePubMed
- Lao O, de Gruijter JM, van Duijn K, Navarro A, Kayser M: Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms. Ann Hum Genet. 2007, 71: 354-369. 10.1111/j.1469-1809.2006.00341.x.View ArticlePubMed
- Myles S, Somel M, Tang K, Kelso J, Stoneking M: Identifying genes underlying skin pigmentation differences among human populations. Hum Genet. 2007, 120: 613-621.View ArticlePubMed
- Norton HL, Kittles RA, Parra E, McKeigue P, Mao X, Cheng K, Canfield VA, Bradley DG, McEvoy B, Shriver MD: Genetic evidence for the convergent evolution of light skin in Europeans and East Asians. Mol Biol Evol. 2007, 24: 710-722.View ArticlePubMed
- Rana BK, Hewett-Emmett D, Jin L, Chang BH, Sambuughin N, Lin M, Watkins S, Bamshad M, Jorde LB, Ramsay M, et al: High polymorphism at the human melanocortin 1 receptor locus. Genetics. 1999, 151: 1547-1557.PubMed CentralPubMed
- Harding RM, Healy E, Ray AJ, Ellis NS, Flanagan N, Todd C, Dixon C, Sajantila A, Jackson IJ, Birch-Machin MA, Rees JL: Evidence for variable selective pressures at MC1R. Am J Hum Genet. 2000, 66: 1351-1361. 10.1086/302863.PubMed CentralView ArticlePubMed
- Alonso S, Izagirre N, Smith-Zubiaga I, Gardeazabal J, Diaz-Ramon JL, Diaz-Perez JL, Zelenika D, Boyano MD, Smit N, de la Rua C: Complex signatures of selection for the melanogenic loci TYR, TYRP1 and DCT in humans. BMC evolutionary biology. 2008, 8: 74-10.1186/1471-2148-8-74.PubMed CentralView ArticlePubMed
- Soejima M, Tachida H, Ishida T, Sano A, Koda Y: Evidence for recent positive selection at the human AIM1 locus in a European population. Mol Biol Evol. 2006, 23: 179-188.View ArticlePubMed
- Andres AM, Hubisz MJ, Indap A, Torgerson DG, Degenhardt JD, Boyko AR, Gutenkunst RN, White TJ, Green ED, Bustamante CD, et al: Targets of balancing selection in the human genome. Mol Biol Evol. 2009, 26: 2755-2764. 10.1093/molbev/msp190.PubMed CentralView ArticlePubMed
- Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, et al: Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002, 419: 832-837. 10.1038/nature01140.View ArticlePubMed
- Teshima KM, Coop G, Przeworski M: How reliable are empirical genomic scans for selective sweeps?. Genome Res. 2006, 16: 702-712. 10.1101/gr.5105206.PubMed CentralView ArticlePubMed
- Zeng K, Mano S, Shi S, Wu CI: Comparisons of site- and haplotype-frequency methods for detecting positive selection. Mol Biol Evol. 2007, 24: 1562-1574. 10.1093/molbev/msm078.View ArticlePubMed
- Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen R: Ascertainment bias in studies of human genome-wide polymorphism. Genome Res. 2005, 15: 1496-1502. 10.1101/gr.4107905.PubMed CentralView ArticlePubMed
- Nielsen R, Signorovitch J: Correcting for ascertainment biases when analyzing SNP data: applications to the estimation of linkage disequilibrium. Theor Popul Biol. 2003, 63: 245-255. 10.1016/S0040-5809(03)00005-4.View ArticlePubMed
- Kreitman M, Di Rienzo A: Balancing claims for balancing selection. Trends Genet. 2004, 20: 300-304. 10.1016/j.tig.2004.05.002.View ArticlePubMed
- Soldevila M, Calafell F, Helgason A, Stefansson K, Bertranpetit J: Assessing the signatures of selection in PRNP from polymorphism data: results support Kreitman and Di Rienzo's opinion. Trends Genet. 2005, 21: 389-391. 10.1016/j.tig.2005.05.001.View ArticlePubMed
- Simonsen KL, Churchill GA, Aquadro CF: Properties of statistical tests of neutrality for DNA polymorphism data. Genetics. 1995, 141: 413-429.PubMed CentralPubMed
- Harding RM, McVean G: A structured ancestral population for the evolution of modern humans. Curr Opin Genet Dev. 2004, 14: 667-674. 10.1016/j.gde.2004.08.010.View ArticlePubMed
- Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, et al: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007, 447: 799-816. 10.1038/nature05874.View ArticlePubMed
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456: 53-59. 10.1038/nature07517.PubMed CentralView ArticlePubMed
- Fagundes NJ, Ray N, Beaumont M, Neuenschwander S, Salzano FM, Bonatto SL, Excoffier L: Statistical evaluation of alternative models of human evolution. Proc Natl Acad Sci USA. 2007, 104: 17614-17619. 10.1073/pnas.0708280104.PubMed CentralView ArticlePubMed
- Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD: Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009, 5: e1000695-10.1371/journal.pgen.1000695.PubMed CentralView ArticlePubMed
- Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, Altshuler D: Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 2005, 15: 1576-1583. 10.1101/gr.3709305.PubMed CentralView ArticlePubMed
- Wall JD, Lohmueller KE, Plagnol V: Detecting ancient admixture and estimating demographic parameters in multiple human populations. Mol Biol Evol. 2009, 26: 1823-1827. 10.1093/molbev/msp096.PubMed CentralView ArticlePubMed
- Jablonski NG: The evolution of human skin and skin color. Annual Review of Anthropology. 2004, 33: 585-623. 10.1146/annurev.anthro.33.070203.143955.View Article
- Hughes DA, Tang K, Strotmann R, Schoneberg T, Prenen J, Nilius B, Stoneking M: Parallel selection on TRPV6 in human populations. PLoS ONE. 2008, 3: e1686-10.1371/journal.pone.0001686.PubMed CentralView ArticlePubMed
- Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, Mansoor A, Zerjal T, Tyler-Smith C, Mehdi SQ: Y-chromosomal DNA variation in Pakistan. Am J Hum Genet. 2002, 70: 1107-1124. 10.1086/339929.PubMed CentralView ArticlePubMed
- Rosenberg NA, Li LM, Ward R, Pritchard JK: Informativeness of genetic markers for inference of ancestry. Am J Hum Genet. 2003, 73: 1402-1422. 10.1086/380416.PubMed CentralView ArticlePubMed
- Sturm RA: Molecular genetics of human pigmentation diversity. Hum Mol Genet. 2009, 18: R9-17. 10.1093/hmg/ddp003.View ArticlePubMed
- Stephens M, Donnelly P: A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet. 2003, 73: 1162-1169. 10.1086/379378.PubMed CentralView ArticlePubMed
- Stephens JC, Schneider JA, Tanguay DA, Choi J, Acharya T, Stanley SE, Jiang R, Messer CJ, Chew A, Han JH, et al: Haplotype variation and linkage disequilibrium in 313 human genes. Science. 2001, 293: 489-493. 10.1126/science.1059431.View ArticlePubMed
- Shin J-H, Blay S, McNeney B, Graham J: LDheatmap: An R function for graphical display of pairwise linkage disequilibria between single nucleotide polymorphisms. Journal of Statistical Software. 2006, 16: Code Snippet 3
- Nei M: Molecular Evolutionary Genetics. 1987, New York: Columbia University Press
- Watterson GA: On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975, 7: 256-276. 10.1016/0040-5809(75)90020-9.View ArticlePubMed
- Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R: DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003, 19: 2496-2497. 10.1093/bioinformatics/btg359.View ArticlePubMed
- Tajima FV: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.PubMed CentralPubMed
- Fu YX, Li WH: Statistical tests of neutrality of mutations. Genetics. 1993, 133: 693-709.PubMed CentralPubMed
- Fay JC, Wu CI: Hitchhiking under positive Darwinian selection. Genetics. 2000, 155: 1405-1413.PubMed CentralPubMed
- Fu YX: Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics. 1997, 147: 915-925.PubMed CentralPubMed
- Watterson GA: The homozygosity test of neutrality. Genetics. 1978, 88: 405-417.PubMed CentralPubMed
- Voight BF, Adams AM, Frisse LA, Qian Y, Hudson RR, Di Rienzo A: Interrogating multiple aspects of variation in a full resequencing data set to infer human population size changes. Proc Natl Acad Sci USA. 2005, 102: 18508-18513. 10.1073/pnas.0507325102.PubMed CentralView ArticlePubMed
- Wilson DJ, McVean G: Estimating diversifying selection and functional constraint in the presence of recombination. Genetics. 2006, 172: 1411-1425.PubMed CentralView ArticlePubMed
- Marjoram P, Molitor J, Plagnol V, Tavare S: Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA. 2003, 100: 15324-15328. 10.1073/pnas.0306899100.PubMed CentralView ArticlePubMed
- Livingston RJ, von Niederhausern A, Jegga AG, Crawford DC, Carlson CS, Rieder MJ, Gowrisankar S, Aronow BJ, Weiss RB, Nickerson DA: Pattern of sequence variation across 213 environmental response genes. Genome Res. 2004, 14: 1821-1831. 10.1101/gr.2730004.PubMed CentralView ArticlePubMed
- DeGiorgio M, Jakobsson M, Rosenberg NA: Out of Africa: modern human origins special feature: explaining worldwide patterns of human genetic variation using a coalescent-based serial founder model of migration outward from Africa. Proc Natl Acad Sci USA. 2009, 106: 16057-16062. 10.1073/pnas.0903341106.PubMed CentralView ArticlePubMed
- Rosenberg NA, Nordborg M: Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet. 2002, 3: 380-390. 10.1038/nrg795.View ArticlePubMed
- Hoggart CJ, Chadeau-Hyam M, Clark TG, Lampariello R, Whittaker JC, De Iorio M, Balding DJ: Sequence-level population simulations over large genomic regions. Genetics. 2007, 177: 1725-1731. 10.1534/genetics.106.069088.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.