In the present study, we focused on reinvestigating previous conclusions about positive selection based on long-range haplotype (LRH) tests, using four genes putatively associated with human pigmentation. The phylogenetic networks for each gene based on sequence data were in agreement with our previous findings ; the different populations tended to show a high frequency of one of the major haplotypes, which tended to diverge from the others by a large number of mutations, and the single SNP differentiation between populations was also in agreement with previous results. This was particularly evident for OCA2, KITLG and DCT in European and Asian populations, and less evident in the African population. The Pakistani population, geographically situated between the Europeans and Asians, shares the main haplotypes with these two populations. The presence of long network branches within each population can be indicative of balancing selection ; however, we failed to replicate previous LRH findings with the sequence-based tests, and we observed dependence of the statistical significance of the sequence-based tests on which neutrality distribution was used. Only the KITLG in the European population had statistical departures from neutrality in the CM and GM, which is in agreement with the outcome from the LRH test, but neutrality could not be rejected using ENCODE data. Furthermore, we were not able to replicate a previously suggested signal for the DCT gene in Asian populations . As the whole DCT gene is a single linkage disequilibrium (LD) block, as seen in the HapMap East Asian data, it seems unlikely that the discrepancy is explicable by the different DCT regions sequenced. Indeed, the agreement between different SNP-scan studies has been described as 'underwhelming' .
The discrepancies we detected here between haplotype-based and sequence-based test outcomes can be explained by a number of factors.
First, we cannot exclude the possibility that the positive-selection signals from our previous SNP-based study were false positives; the complex demographic history of humans [12, 32, 33] and the power dependency of the tested site  can affect the outcome of such tests.
Second, it has been emphasized that the SNP ascertainment bias introduced during marker discovery [35, 36] and genotyping array can lead to spurious false-positive findings in haplotype-based tests [37, 38].
Third, there might be a lack of power in the sequence-based tests because of the small sample sizes and/or small sequenced regions . Although we cannot exclude this possibility, the length (approximately 5 kb) sequenced from the four genes proved to be sufficient to detect departures from neutrality in SLC45A2 in the European population (data not shown).
A fourth possibility is that the distributions that we computed for each statistic under neutrality do not represent the true underlying distribution for the human species. Parameters of the demographic events need to be defined a priori, which in humans is challenging because of the complex history of migrations, admixture, expansions and bottlenecks . The differences seen in the values of the parameters could be indeed a major source of variation. The ENCODE data we used as an alternative is hampered by the fact that the considered regions were ascertained based on their genomic peculiarities , and they may not be representative of the genetic variability of the genome. There has been progress in resequencing entire genomes (for example, the 1000 Genomes Project; http://www.1000genomes.org/page.php); however, current projects rely on combining low-coverage data from multiple samples, and are not able to produce the accurate sequence for each genome that is needed for such comparisons .
The fifth, and perhaps most likely, reason for discrepancies between LRH and sequence-based tests we observed here may be the different underlying assumptions of the evolutionary models used (that is, instantaneous selective sweep versus incomplete selective sweeps) in the definition of each statistic, and the evolutionary timescale over which each type of test can recover departures from neutrality . In that case, our results might indicate an extremely recent selection in the pigmentation genes, which would be recovered by haplotype-based but not sequence-based tests.
We also used a Bayesian approach to estimate selective parameters of the populations putatively under positive selection in each gene. The demographic parameters were on average in concordance with those described in previous studies [43–46]; however, it should be noted that they were not entirely comparable as there were a large number of differences in the assumptions of the models and data. Despite this technical limitation of the approach, the estimates of the time when selection started and the mode of inheritance correlated well with expectations in the case of SLC45A2, independently of whether the complete 10 kb sequence or a subsample of 5 kb was used (data not shown). To our knowledge, this is the first time this known selective sweep has been quantified in such a way. However, for OCA2, TYRP1, DCT and KITLG and the neutral simulated region, the strong resemblance between the prior and the posterior distributions suggests that the latter are mainly dominated by the priors rather than by the information contained in the genetic data.