Modeling the contrasting Neolithic male lineage expansions in Europe and Africa
© Sikora et al.; licensee BioMed Central Ltd. 2013
Received: 16 August 2013
Accepted: 21 October 2013
Published: 21 November 2013
Skip to main content
© Sikora et al.; licensee BioMed Central Ltd. 2013
Received: 16 August 2013
Accepted: 21 October 2013
Published: 21 November 2013
Patterns of genetic variation in a population carry information about the prehistory of the population, and for the human Y chromosome an especially informative phylogenetic tree has previously been constructed from fully-sequenced chromosomes. This revealed contrasting bifurcating and starlike phylogenies for the major lineages associated with the Neolithic expansions in sub-Saharan Africa and Western Europe, respectively.
We used coalescent simulations to investigate the range of demographic models most likely to produce the phylogenetic structures observed in Africa and Europe, assessing the starting and ending genetic effective population sizes, duration of the expansion, and time when expansion ended. The best-fitting models in Africa and Europe are very different. In Africa, the expansion took about 12 thousand years, ending very recently; it started from approximately 40 men and numbers expanded approximately 50-fold. In Europe, the expansion was much more rapid, taking only a few generations and occurring as soon as the major R1b lineage entered Europe; it started from just one to three men, whose numbers expanded more than a thousandfold.
Although highly simplified, the demographic model we have used captures key elements of the differences between the male Neolithic expansions in Africa and Europe, and is consistent with archaeological findings.
Around 50 to 70 thousand years ago (approximately 60 KYA), modern humans expanded out of Africa, and by approximately 15 KYA had colonized all inhabitable continents . During most of this period, the climate was both cold and unstable, but after approximately 10 KYA (the beginning of the Holocene period) it warmed and stabilized to produce the climate we know today. Early humans subsisted by hunting and gathering, but in the Holocene additional lifestyles became possible, including agriculture and pastoralism. This ‘Neolithic transition’ occurred independently at different times during the Holocene in different geographical regions. One Neolithic transition began in the Fertile Crescent in the Near East approximately 10 KYA and spread outwards in several directions, including into Europe over the course of several thousand years . In sub-Saharan Africa, a comparable transition began later, approximately 3 KYA in West Africa, and spread south and east, reaching the extreme south only within historical times . This differed from the transition in Europe in a number of respects: for example, there was no change in stone tool technology or use of copper or bronze, but instead a direct transition from the Later Stone Age to iron use, and some archaeologists therefore consider it inappropriate to use the term ‘Neolithic’ , but we retain it here because it is simple and widely understood. Both transitions were associated with large increases in population size.
Genetic evidence has contributed to our understanding of these events. There has been debate about the extent to which the genomes of present-day inhabitants of these areas have been derived from Neolithic farmers or from Paleolithic hunter-gatherers. The first large-scale molecular-genetic analyses in Europe were based on mitochondrial DNA (mtDNA) from present-day Europeans and were interpreted as favoring a Paleolithic entry for the majority of European mtDNAs . More direct tests of this question, however, using ancient DNA (aDNA), have revealed a discontinuity between hunter-gatherer and early farmer mtDNAs, suggesting a Neolithic or later entry for the lineages that are most common today [5–8]. Similarly, low-coverage whole-genome sequencing supported the idea of a southern origin for early farmers from northern Europe [9, 10], and thus migration and expansion of incoming Neolithic populations to replace the previous occupants.
The Y chromosome has several properties that make it potentially very informative about historical events, including the Neolithic transition. Its lack of recombination over most of its length means that it provides the most detailed and informative phylogenetic tree for any locus in the genome, while as a consequence of its strict father-to-son transmission it carries information specifically about male events . Y-chromosomal lineages differ substantially between geographical regions and in each of the two areas considered here a single lineage predominates: R1b (especially the sublineage defined by the SNP M269, rs9786153) in Western Europe [12, 13] and E1b1a (defined by the SNP known variously as M2, sY81, DYS271 or rs9785941) in sub-Saharan Africa . While these observed geographical distributions are uncontested, and E1b1a has been widely associated with the Neolithic expansion in Africa [15, 16], the time depth of R1b in Europe has been disputed, with opinions ranging from a Paleolithic date  to a Neolithic one . aDNA has not yet been very informative for the Y chromosome, although the limited data available show no evidence of pre-Neolithic R1b lineages . Full sequences from the Y chromosomes of present-day individuals, however, have recently become available, and these support a Neolithic spread of R1b . In addition, the tree structure resulting from these sequences, based on the unbiased ascertainment of variants, is informative in other ways. There is a striking difference in the structure of the E1b1a and R1b phylogenies: R1b has a starlike structure indicative of an expansion so rapid that few mutations occurred during the expansion, while E1b1a has a more regular bifurcating structure.
In the current study, we accept R1b and E1b1a as lineages that expanded during the Neolithic, and set out to explore, using coalescent simulations, the demographic conditions under which their different phylogenetic structures might be expected to arise. We found that these differ between the two continents, and link our conclusions to the available archaeological evidence.
The samples consisted of 21 high-coverage Y-chromosomal sequences downloaded from the Complete Genomics website , eight from the E1b1a haplogroup and 13 from the R1b haplogroup. Filtering of the data and generation of a phylogenetic tree from them have been described previously . Eight individuals within the R1b haplogroup were from a three-generation pedigree, so in the current work where the simulations assume individuals are unrelated, this pedigree was combined to make a single branch by averaging the number of distinct SNPs in each family member and adding this value to the number of SNPs shared by all of the individuals.
Simulations were performed using MaCS , a coalescent simulator, using six and eight haplotypes for the R1b and E1b1a data, respectively, with a sequence length of 8.8 × 106 nucleotides, assuming a generation time of 30 years , a mutation rate of 3 × 10-8 per nucleotide per generation  and zero recombination. The simulations explored the parameters of a single population expansion using four variables: the starting and final population sizes, the time when the expansion ended, and the length of the expansion. Examples of the command lines used are provided in Additional file 1: Table S2.
where b is a tree branch of length l b , which has n BEN branches of length l bi beneath its node, n TER is the number of terminal branches and n INT is the number of internal branches.
The other two statistics were calculated by determining the branch length of the TMRCA of each combination of the individual haplotypes and computing the mean and standard deviation. The three statistics thus reflect both the time depth of the tree and how starlike its structure is.
where the subscript s indicates a simulated value, o an observed value, r a singleton/shared ratio statistic, m a mean TMRCA statistic and d a standard deviation of a TMRCA statistic.
A low AND value thus indicates a good fit to the empirical data. We completed 1,000 simulations for each demographic scenario and averaged each statistic to use as the simulated value.
The ranges for the parameters on the first set of simulations and corresponding heat map were each chosen to be very wide, including all reasonable estimates for their values (Additional file 2: Table S1). The parameter ranges for the time the expansion ended and the length of the expansion were each extended past the empirical TMRCA for each respective haplogroup. For each successive heat map, a conservative selection of the lowest AND values was noted and the ranges for the following set of simulations chosen to include these, unless their TMRCAs were not compatible with the maximum TMRCA of the haplogroup. Thus we sequentially removed parameter values that resulted in large AND values, progressively narrowing the range until it encompassed only AND values of 0.05 and below. Although these do not provide an absolute measure of how well the model fits the data, they show that among the wide ranges of parameters explored, these are the best fits. Then, a histogram for each parameter was created using the frequency of sub-0.05 AND values, to provide an indication of our conclusions regarding this parameter value.
The model we have explored, involving a single exponential expansion, is grossly simplified. In addition, we have analyzed within each population a single lineage (R1b or E1b1a) of a single locus (the Y chromosome), and this may not be representative of the population. Nevertheless, there are several reasons to believe that our results should capture features of interest. First, the male history represented by the Y chromosome is of interest whether or not it corresponds to the history of other regions of the genome. Second, the single Y lineages we examined are the most frequent in their respective geographical regions, being found in >75% and >80% of males from many Western European and sub-Saharan African populations, respectively, so form a major constituent of the Y-chromosomal gene pool. Furthermore, the chromosomes sampled within each of the two lineages have diverse geographical origins: the R1b chromosomes come from the CEU (Northwestern Europe ), TSI (Italy), PUR and MXL (probably Iberia) populations, while the E1b1a chromosomes come from the YRI (Nigeria), LWK (Kenya) and ASW (probably West Africa) populations. Thus their origins are not confined to any one country or small geographical area, and are likely to be broadly representative of these lineages. Third, the Y phylogenies, based on resequencing approximately 9 Mb of Y-chromosomal DNA, are very robust, especially in this high-coverage dataset where singletons will be called reliably. Consequently the R1b chromosomes in this set, for example, must have radiated in an interval so short that there was only enough time for a single mutation to occur, no matter how complex the migrations, integrations or replacements and other cultural changes going on in the society carrying these chromosomes. Fourth, although only a portion of the parameter space has been explored within the model, and it remains possible (indeed, it is an inevitable feature of this approach), that an undiscovered global optimum with very narrow parameter values may exist, our sequential approach (Additional files 3: Figures S1 to S14) minimizes the chance of this, and we discuss below the good correspondence with other sources of information.
With these caveats, we can consider how the Y-chromosome-based genetic findings fit with other genetic and archaeological evidence. The Neolithic transition in Europe has been studied extensively by archaeologists. It appeared in Greece approximately 9 KYA and reached the extreme west by approximately 4 KYA [1, 2]. The demographic model suggests that the R1b expansion most likely ended before this time, at approximately 12 KYA (Figures 4 and 5), which appears inconsistent with a Neolithic expansion of this lineage, although the lower limit does extend to approximately 6 KYA. We interpret the discrepancy, however, as a limitation of the model. We constrained the parameter values so that R1b could not expand before the estimated TMRCA of the sampled R1b chromosomes , and the model favored an immediate expansion of the lineage, hence the expansion at approximately 12 KYA. If we had used the more likely 4 to 5 KYA estimate of the R1b TMRCA from the rho statistic , the expansion in the current model would have been placed close to this time, well within the Neolithic and, interestingly, also close to the time of establishment of the major European mtDNA haplogroup, H, approximately 6 KYA [7, 8]. The rapidity of the R1b expansion and the large increase in population size are most consistent with migration and population replacement, issues debated by archaeologists but favored by the aDNA data [5–9]. The later and more gradual E1b1a expansion in Africa is as expected from the spread of cattle-herders from the north between 2.5 and 8 KYA, followed by the Bantu expansion to the southern tip of the continent beginning approximately 2.5 KYA and ending within the last few hundred years, incorporating the package of Bantu languages, cattle and iron-working [1, 3]. The population sizes used by the model are genetic effective population sizes, which, for a population that has expanded recently, are much smaller than the census population size .
Studies of this kind can be improved by considering more complex demographic models and larger Y-chromosomal datasets. While it may seem obvious that more complex and thus more realistic models should be preferable, models are only useful if the different scenarios they encompass can be discriminated between using the data available, so the simplest model that captures a relevant aspect of the data may still be the most appropriate one. Thus while future models in this context could incorporate spatial structure and phenomena such as surfing , a single rapid expansion should still be permitted. We have modeled only a single Y haplogroup, because in each expansion a single haplogroup predominates. Low-coverage sequencing of larger population samples by the 1000 Genomes Project [26, 27] and two recent studies focusing on Africa  and Sardinia  confirm both the high frequencies of haplogroups R1b and E1b1a in the relevant populations and the structures of the phylogenetic trees associated with them. These projects thus provide much larger datasets, which could be used in future modeling studies, although the low coverage and substantial false negative rates of rare variants would need to be taken into account. With such data, the additional rare Y haplogroups present in the populations could also be considered. Different studies have come to different conclusions about the Y-chromosomal mutation rate [22, 28, 29]; in the current study, the mutation rate is used simply to scale the results, and a mutation rate about half  of that used here , for example, would double the times. Finally, we note that such analyses of single lineages, which may have deep coalescences, contrast with the universal sharing of recent genealogical ancestors by all people within the last few thousand years .
We have identified demographic scenarios that can lead to the contrasting phylogenies observed for the major Y-chromosomal lineages that expanded during the distinct Neolithic transitions in Europe and Africa. These suggest that in Europe, the R1b lineage experienced an extremely rapid and extensive increase as soon as it entered the continent, expanding more than a thousandfold in a few generations. The expansion in Africa began from a larger population size, took thousands of years and ended only recently. While these conclusions are based on a simplified demographic model, they capture major differences between the continents and fit many aspects of the archaeological findings.
Average normalized delta
Thousand years ago
Single nucleotide polymorphism
Time to the most recent common ancestor.
This work was supported by the Wellcome Trust (WT098051); MJS was supported by an International Thesis Research Grant from the Schreyer Honors College at The Pennsylvania State University.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.