High diversity and no significant selection signal of human ADH1B gene in Tibet

Background ADH1B is one of the most studied human genes with many polymorphic sites. One of the single nucleotide polymorphism (SNP), rs1229984, coding for the Arg48His substitution, have been associated with many serious diseases including alcoholism and cancers of the digestive system. The derived allele, ADH1B*48His, reaches high frequency only in East Asia and Southwest Asia, and is highly associated with agriculture. Micro-evolutionary study has defined seven haplogroups for ADH1B based on seven SNPs encompassing the gene. Three of those haplogroups, H5, H6, and H7, contain the ADH1B*48His allele. H5 occurs in Southwest Asia and the other two are found in East Asia. H7 is derived from H6 by the derived allele of rs3811801. The H7 haplotype has been shown to have undergone significant positive selection in Han Chinese, Hmong, Koreans, Japanese, Khazak, Mongols, and so on. Methods In the present study, we tested whether Tibetans also showed evidence for selection by typing 23 SNPs in the region covering the ADH1B gene in 1,175 individuals from 12 Tibetan populations representing all districts of the Tibet Autonomous Region. Multiple statistics were estimated to examine the gene diversities and positive selection signals among the Tibetans and other populations in East Asia. Results The larger Tibetan populations (Qamdo, Lhasa, Nagqu, Nyingchi, Shannan, and Shigatse) comprised mostly farmers, have around 12% of H7, and 2% of H6. The smaller populations, living on hunting or recently switched to farming, have lower H7 frequencies (Tingri 9%, Gongbo 8%, Monba and Sherpa 6%). Luoba (2%) and Deng (0%) have even lower frequencies. Long-range haplotype analyses revealed very weak signals of positive selection for H7 among Tibetans. Interestingly, the haplotype diversity of H7 is higher in Tibetans than in any other populations studied, indicating a longer diversification history for that haplogroup in Tibetans. Network analysis on the long-range haplotypes revealed that H7 in the Han Chinese did not come from the Tibetans but from a common ancestor of the two populations. Conclusions We argue that H7 of ADH1B originated in the ancestors of Sino-Tibetan populations and flowed to Tibetans very early. However, as Tibetans depend less on crops, and therefore were not significantly affected by selection. Thus, H7 has not risen to a high frequency, whereas the diversity of the haplogroup has accumulated to a very high level.

Methods: In the present study, we tested whether Tibetans also showed evidence for selection by typing 23 SNPs in the region covering the ADH1B gene in 1,175 individuals from 12 Tibetan populations representing all districts of the Tibet Autonomous Region. Multiple statistics were estimated to examine the gene diversities and positive selection signals among the Tibetans and other populations in East Asia. Results: The larger Tibetan populations (Qamdo, Lhasa, Nagqu, Nyingchi, Shannan, and Shigatse) comprised mostly farmers, have around 12% of H7, and 2% of H6. The smaller populations, living on hunting or recently switched to farming, have lower H7 frequencies (Tingri 9%, Gongbo 8%, Monba and Sherpa 6%). Luoba (2%) and Deng (0%) have even lower frequencies. Long-range haplotype analyses revealed very weak signals of positive selection for H7 among Tibetans. Interestingly, the haplotype diversity of H7 is higher in Tibetans than in any other populations studied, indicating a longer diversification history for that haplogroup in Tibetans. Network analysis on the long-range haplotypes revealed that H7 in the Han Chinese did not come from the Tibetans but from a common ancestor of the two populations.
Conclusions: We argue that H7 of ADH1B originated in the ancestors of Sino-Tibetan populations and flowed to Tibetans very early. However, as Tibetans depend less on crops, and therefore were not significantly affected by selection. Thus, H7 has not risen to a high frequency, whereas the diversity of the haplogroup has accumulated to a very high level.

Background
The alchohol dehydrogenase (ADH) gene family has seven members expressed in different organs and tissues; ADH1B is expressed mostly in the liver and lungs [1], where most alchohol is dehydrogenated [2]. Therefore, ADH1B can be considered the most important member of the ADH family, and has become one of the most studied model genes for natural selection [3,4] among the human genes. A single nucleotide polymorphism (SNP), ADH1B Arg48His (rs1229984), results in large functional differences in the respective enzymes of the ancestral and derived alleles. The enzyme catalytic activity of the derived allele is 40 times that of the ancestral allele [5]. Thus, this SNP has been found to be relevant to cancers of the digestive and respiratory systems, alcoholism, addiction, and many other disorders [6][7][8][9][10].
The allele frequency of ADH1B*48His varies greatly among the world's populations. The derived allele reaches high frequency only in eastern East Asia and Southwest Asia and is almost absent in the rest of the world [3]. Further study revealed that several SNPs in ADH1B form different haplotypes, and haplotypes with the ADH1B*48His allele are different in East Asia and Southwest Asia [4], with evidence in East Asia of strong positive natural selection [11,12] during the history of agriculture [13]. In western East Asia, the allele frequency of ADH1B*48His is not high [12], especially among Tibetans [4]. Tibetans share very recent common ancestors with the Han Chinese. Important questions are whether the ADH1B*48His alleles of Tibetans and Han Chinese have a common origin and, if so, why the frequency of this allele did not rise in Tibetans as it did in Han Chinese.
Our previous study found seven haplogroups (H1 to H7) for ADH1B among the world populations [14] based on seven SNPs in the gene. The ADH1B*48His allele appears in H5, H6, and H7. H5 is a Southwest Asian haplogroup. H6 derived from a crossover involving H5 and occurs primarily in East Asia and the Pacific region. H7 is derived from H6 by the addition of the derived allele of rs3811801 in the regulatory region of ADH1B. The age of H6 is about 15,000 to 21,000years [14], which is about the age of the modern East Asians [15][16][17]. Expansion of H7 happened only about 2,800 years ago, and is the only haplogroup with a strong signal of selection [14]. The frequency of H7 is much higher in Han Chinese than the frequency of H6 [12,18]. No study has previously investigated the distribution of the ADH1Bhaplogroups in Tibetan populations.
The languages of the Tibetans and Han Chinese belong to the Sino-Tibetan linguistic family. DNA evidence generally supports the hypothesis that populations speaking similar languages have recent common ancestors, especially in East Asia [19][20][21]. Y-chromosome DNA analyses argue that the divergence of Han Chinese and Tibeto-Burman populations was no earlier than 6,000 years ago [20,22,23].
There are five populations living in Tibet: Tibetans; Sherpa;Monba;Lhoba; and Deng [24]. The Tibetans are the major population of Tibet, divided into three major branches, Weizang in central Tibet, Amdo in the north, and Khams in the east. Two minor Tibetan populations, Tingri and Gongbo, are yet to be classified into the three branches. Monba, Lhoba, and Deng are all in southeast Tibet. Their languages are mostly in the North Assam branch of Tibeto-Burman, while three-quarters of the Monba people use dialects mixed with the Tibetan language. The Sherpa people live in the middle of the Himalayas, an area overlapping with China, Nepal, Sikkim, and Bhutan, and speak a language very close to Tibetan [25]. In this paper, we investigated Class I ADH and ADH7haplogroup diversity among all populations in Tibet, and examined the diversity and selection signal of ADH1B.

Statistics
Allele frequencies of two core SNPs, rs1229984 and rs3811801, were estimated from the genotypes by simple gene counting assuming co-dominant inheritance and absence of null alleles. Geographic distributions of allele frequencies of these two SNPs in East Asia were transformed into contour maps using Surfer 8.0 (Golden Software, Golden, CO, USA). Both our results and data from literature were included [11,12,14,[27][28][29][30][31][32][33][34][35] in the maps; we used the Kriging method for data interpolation.
Haplotypes of the 23 SNPs were determined using PHASE2.1 [36,37]. To make the haplotype estimation more reliable, our previous data of 4,362 chromosomes from 47 populations from the other region of the world were included as references [14]. ADH1Bhaplogroups were then determined according to the definitions of H1 to H7 [14].
High extended linkage disequilibrium among the SNPs in the relevant genomic region might be signal of selection [40]. We used the long-range haplotype (LRH) test to examine the linkage disequilibrium and potential positive selection on the core haplotypes (rs1229984-rs6810842-rs3811801) of ADH1B [41]. Both extended haplotype homozygosity (EHH) and relative EHH (REHH) [42][43][44] were calculated in LRH tests. The integrated haplotype score (iHS) test was also employed to test for positive selection [44].
Principal component analysis was employed to assess the population relationships within the gene region. To identify the genetic barriers among the populations, pair-wise Fst values were estimated and the Barrier 2.2 software [45] was used.

Haplogroup frequencies and diversities
We estimated 671 different haplotypes considering all 23 SNPs in all individual Tibetan samples (Table S1). These haplotypes were classified into 13 haplogroups for ADH1B (Table 1). There are three haplogroups containing the ADH1B*48His allele, H5, H6, and H7. H5 is very rare in Tibet. The frequency of H6 among the major Tibetan populations is only around 2%, and is absent in some minor populations. The frequency of H7 is higher than that of H6 in all populations, reaching around 12% in major Tibetan populations, and lower in the minor populations. ADH1B*48His is totally absent in Deng. A new haplogroup, H7b, was defined with the ancestral allele of rs1229984 and derived allele of rs3811801. This new haplogroup is almost absent outside of Tibet.
To assess the geographic distributions of ADH1B*48His (rs1229984*T) and rs3811801*A in western East Asia, we transformed the allele frequencies into contour maps (Figure 2A and B). The frequency of ADH1B*48His shows a clear decrease from east to west (Pearson correlation between longitude and the frequencies of rs1229984*T: r = 0.617, P = 2.27 × 10 -5 ). The decrease to the west is smoother in the north than in the south. In Tibet, the frequency decreases slightly from north to south. This might indicate a migration of the Tibetans from north to south. Distribution of rs3811801*A is similar to that of ADH1B*48His but at lower frequencies (Pearson correlation between longitude and the frequencies of rs3811801*T: r = 0.673, p = 1.04 × 10 -15 ).
Gene diversity was estimated from the haplotypes of 23 SNPs among haplogroup H7 and then transformed into a contour map ( Figure 2C). Although H7 does not reach high frequency in Tibet, the within haplogroup diversity is very high. We also estimated the gene diversity of H7 within each linguistic family ( Table 2). The total diversity of the Tibetans is the highest among all families.

Long-range haplotype networks
Similarities among the haplotypes can predict population relationships. We performed network analyses to assess the haplotype similarities ( Figure 3). Haplogroup H5 is a West Asian type [14]. It is almost absent in East Asia, but appears in low frequency in Southeastern Asia and Northern Asia. We also found some H5 chromosomes in Tibet. According to the network, haplotypes of these samples were closely related to the Northern Asians, possibly indicating gene flow from Northern Asia to Tibet or from a common ancestor into both regions. This phenomenon also appears in the H7 network. The major Tibetan populations share some common haplotypes with other populations, while the minor populations have unique haplotypes. In the H7 network, the major Tibetan populations also have some unique haplotypes different from the haplotypes unique to Han Chinese. Therefore, we conclude that H7 haplotypes experienced different recent histories in the Tibetans and Han Chinese. The H7 haplotypes in Tibetans did not originate from Han Chinese but from the common ancestral population of Sino-Tibetan people.

Positive selection test
The frequency of the youngest haplogroup H7 in Tibet is much higher than that of H6, suggesting positive selection might have increased the frequency of H7. We performed LRH analyses on the ADH1B gene to test for selection signals among populations in Tibet. LRH analysis has good performance for positive selection on low frequency alleles (approximately 10%) [46]. We calculated both EHH and REHH values of the core haplotype rs1229984-rs6810842-rs3811801 (Figure 4). EHH values of most haplotypes decreased rapidly from the core haplotype except for that of the haplotype with both derived alleles of rs1229984 and rs3811801 (H7). However, REHH did not show significant signals of selection for any haplotypes.
In addition, we performed the iHS test, which integrates and makes comparison between integrated extended haplotype homozygosity (iHH) of the ancestral allele and iHH of the derived allele for each SNP we examined. However, no signals of positive selection were observed (Additional file 2: Figure S1). Therefore, we conclude that ADH1B H7 in Tibet has undergone only very weak, if any, positive selection.

Population relationships for ADH1B region
To assess the population relationships within the ADH1B region, we did principal component analysis based on the estimated haplotype frequencies of the populations ( Figure 5). In the first component, the Tibetans and Han Chinese are clearly distinguished, while the Qiang populations are between the Tibetans and Han Chinese. The Deng and Sherpa are obvious outliers from the central Tibetan populationswith the Sherpa closer to the Qiang.
We also calculated the population pair-wise Fst values and used the Barrier 2.2 software to identify genetic    In East Asia, the ADH1B gene is one of the genes whose diversity is correlated with ethnic classifications. Frequencies of ADH1Bhaplogroups are very different among different ethnic groups (linguistic families) [12]. Compared to the high frequency of H7 in Han and Hmong Chinese, the frequency of H7 is rather low in the Tibetans (approximately 12%) and other populations (approximately 5%) in Tibet. However, the haplotype diversity of H7 reaches the highest value in the Tibetans, indicating a long history of this haplogroup in Tibet. Network analysis showed that most H7 haplotypes in the Tibetans have quite different flanking sequences from those that occur in Han Chinese. Thus, H7 has diverged in these two populations for a long time, and the origin of H7 might not be in either of the populations. The Tibetans and Han  Chinese both speak Sino-Tibetan languages. Genetic and linguistic studies indicate that these two ethnic groups originated in the common ancestors in the upper reaches of the Yellow River about 6,000 years ago [20,23]. ADH1B H7 might have come from the common ancestors of the Sino-Tibetan populations. Historical records say that the Tibetans came from the ancient Qiang people [47], which is the original population of Sino-Tibetan people. In our present Qiang sample, the H7 haplotype diversity is not the highest, but the average nucleic diversity is the highest, indicating a great age of H7 in the Qiang. However, our Qiang sample is from only one of the various Qiang populations [48][49][50]. Other Qiang populations should be included in future investigation to provide a better, more detailed evolutionary history of the ADH1B gene in East Asia.
Why is there no signal of selection onADH1B H7 in Tibet?
Signals of selection on ADH1B H7 are strong in Han Chinese, Japanese, Koreans, and Hmong. In ADH1B H7, both alleles of the non-synonymous rs1229984 and regulatory region rs3811801 are derived. We did not find samples with only the derived allele of rs3811801 in the previous studies [12], and therefore, we cannot be sure if the derived allele of rs1229984 is sufficient to explain the selection. In this study, we found a new haplotype, H7b with only the derived allele of rs3811801 in the Tibetans, but we are not yet sure whether both derived alleles are necessary for selection as the diversity of H7b is too high.
ADH1B H7 was derived from ADH1B H6 [14]. In those East Asian populations lacking selection signals at ADH1B, the frequencies of H6 are all much higher than the derived haplogroup H7. The frequency of H7 is much higher than H6 and appears to have increased rapidly as the result of selection in Han Chinese, Japanese, and so on [11,12]. In Tibetans, the frequency of H7 is also much higher than that of H6, which could also suggest positive selection. However, the LRH test revealed only a weak, non-significant signal of selection in the Tibetans. That may also explain why the frequency of H7 in Tibetans is not as high as in Han Chinese.
The selection of the 48His variant of ADH1B in East Asia appears related to agriculture, most likely to rice domestication [13]. In Tibet, the major lifestyle is not crop farming but stock farming [51], and the few crops in Tibet are highland barley and millet [52], not rice. This might be the reason selection on ADH1B is not detectable in Tibet. Furthermore, the crops can be better stored on the cold highland than on the warm plain. While no definitive explanation of what characteristic was the basis of selection, these data are consistent with one hypothesis related to toxins from decomposition during storage [53]: fewer toxins would be generated during crop storage in Tibet, and therefore, the selective force on the ADH1B gene would be small to absent.

Conclusions
The diversification of ADH1B in the Sino-Tibetan populations has a long history. Haplogroup H7 of ADH1B originated in the ancestor of Sino-Tibetan populations and flowed to the Tibetans very early. However, as the Tibetans have a lifestyle less dependent on crops, selection has not had significant effects, and H7 has not risen to a high frequency, whereas the diversity of the haplogroup has accumulated to a very high level.